[ 
https://issues.apache.org/jira/browse/IMPALA-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-11668:
---------------------------------
    Description: 
In IMPALA-11666, it has been shown that it's possible to create a transactional 
ORC table with 0 row but greater-than-zero file size even though there is no 
corrupt statistics.

Impala's frontend in this case would compute an estimated number of rows 
instead of using the accurate number of rows (i.e., 0) in 
[HdfsScanNode()#getStatsNumRows()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java#L1575].
 As a result, the returned number of rows is not accurate.

It would be good if Impala's frontend could determine whether or not there is 
indeed corrupt statistics associated with a table when its number of row is 0 
and its associated file size is greater than 0.

  was:
In IMPALA-11666, it has been shown that it's possible to create a transactional 
ORC table with 0 row but greater-than-zero file size even though there is no 
corrupt statistics.

Impala's frontend in this case would compute an estimated number of rows 
instead of using the accurate number of rows (0 in this case) in 
[HdfsScanNode()#getStatsNumRows()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java#L1575].
 As a result, the returned number of rows is not accurate.

It would be good if Impala's frontend could determine whether or not there is 
indeed corrupt statistics associated with a table when its number of row is 0 
and its associated file size is greater than 0.


> Investigate whether Impala's frontend could still use the statistics for an 
> empty table with greater-than-zero file size
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-11668
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11668
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Fang-Yu Rao
>            Assignee: Fang-Yu Rao
>            Priority: Major
>
> In IMPALA-11666, it has been shown that it's possible to create a 
> transactional ORC table with 0 row but greater-than-zero file size even 
> though there is no corrupt statistics.
> Impala's frontend in this case would compute an estimated number of rows 
> instead of using the accurate number of rows (i.e., 0) in 
> [HdfsScanNode()#getStatsNumRows()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java#L1575].
>  As a result, the returned number of rows is not accurate.
> It would be good if Impala's frontend could determine whether or not there is 
> indeed corrupt statistics associated with a table when its number of row is 0 
> and its associated file size is greater than 0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to