[ 
https://issues.apache.org/jira/browse/IMPALA-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10085:
-----------------------------------
    Component/s: Catalog

> Table level stats are not honored when partition has corrupt stats
> ------------------------------------------------------------------
>
>                 Key: IMPALA-10085
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10085
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Catalog
>            Reporter: Sahil Takiar
>            Priority: Minor
>
> This is more of an edge case of IMPALA-9744, but when any partition in a 
> table has corrupt stats, the table-level stats will not be honored. On the 
> other hand, if a table just has missing stats, the table-level stats will be 
> honored.
> Given the a partitioned table with the following partitions and their row 
> counts:
> {code:java}
> [localhost:21000] default> show partitions part_test;
> Query: show partitions part_test
> +---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
> | partcol | #Rows      | #Files | Size | Bytes Cached | Cache Replication | 
> Format | Incremental stats | Location                                         
>          |
> +---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
> | 1       | -1         | 1      | 10B  | NOT CACHED   | NOT CACHED        | 
> TEXT   | false             | 
> hdfs://localhost:20500/test-warehouse/part_test/partcol=1 |
> | 2       | -438290    | 1      | 6B   | NOT CACHED   | NOT CACHED        | 
> TEXT   | false             | 
> hdfs://localhost:20500/test-warehouse/part_test/partcol=2 |
> | 3       | 3          | 1      | 6B   | NOT CACHED   | NOT CACHED        | 
> TEXT   | false             | 
> hdfs://localhost:20500/test-warehouse/part_test/partcol=3 |
> | Total   | 1001000000 | 3      | 22B  | 0B           |                   |   
>      |                   |                                                    
>        |
> +---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
>  {code}
> The query {{explain select * from part_test order by col limit 10}} will 
> cause {{HdfsScanNode#getStatsNumRows}} to return 5.
> Given the following set of partitions with different row counts than above:
> {code}
> +---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
> | partcol | #Rows      | #Files | Size | Bytes Cached | Cache Replication | 
> Format | Incremental stats | Location                                         
>          |
> +---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
> | 1       | -1         | 1      | 10B  | NOT CACHED   | NOT CACHED        | 
> TEXT   | false             | 
> hdfs://localhost:20500/test-warehouse/part_test/partcol=1 |
> | 2       | -1         | 1      | 6B   | NOT CACHED   | NOT CACHED        | 
> TEXT   | false             | 
> hdfs://localhost:20500/test-warehouse/part_test/partcol=2 |
> | 3       | 3          | 1      | 6B   | NOT CACHED   | NOT CACHED        | 
> TEXT   | false             | 
> hdfs://localhost:20500/test-warehouse/part_test/partcol=3 |
> | Total   | 1001000000 | 3      | 22B  | 0B           |                   |   
>      |                   |                                                    
>        |
> +---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
> {code}
> The same method returns 1001000000.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to