[
https://issues.apache.org/jira/browse/IMPALA-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong updated IMPALA-10085:
-----------------------------------
Component/s: Catalog
> Table level stats are not honored when partition has corrupt stats
> ------------------------------------------------------------------
>
> Key: IMPALA-10085
> URL: https://issues.apache.org/jira/browse/IMPALA-10085
> Project: IMPALA
> Issue Type: Sub-task
> Components: Catalog
> Reporter: Sahil Takiar
> Priority: Minor
>
> This is more of an edge case of IMPALA-9744, but when any partition in a
> table has corrupt stats, the table-level stats will not be honored. On the
> other hand, if a table just has missing stats, the table-level stats will be
> honored.
> Given the a partitioned table with the following partitions and their row
> counts:
> {code:java}
> [localhost:21000] default> show partitions part_test;
> Query: show partitions part_test
> +---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
> | partcol | #Rows | #Files | Size | Bytes Cached | Cache Replication |
> Format | Incremental stats | Location
> |
> +---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
> | 1 | -1 | 1 | 10B | NOT CACHED | NOT CACHED |
> TEXT | false |
> hdfs://localhost:20500/test-warehouse/part_test/partcol=1 |
> | 2 | -438290 | 1 | 6B | NOT CACHED | NOT CACHED |
> TEXT | false |
> hdfs://localhost:20500/test-warehouse/part_test/partcol=2 |
> | 3 | 3 | 1 | 6B | NOT CACHED | NOT CACHED |
> TEXT | false |
> hdfs://localhost:20500/test-warehouse/part_test/partcol=3 |
> | Total | 1001000000 | 3 | 22B | 0B | |
> | |
> |
> +---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
> {code}
> The query {{explain select * from part_test order by col limit 10}} will
> cause {{HdfsScanNode#getStatsNumRows}} to return 5.
> Given the following set of partitions with different row counts than above:
> {code}
> +---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
> | partcol | #Rows | #Files | Size | Bytes Cached | Cache Replication |
> Format | Incremental stats | Location
> |
> +---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
> | 1 | -1 | 1 | 10B | NOT CACHED | NOT CACHED |
> TEXT | false |
> hdfs://localhost:20500/test-warehouse/part_test/partcol=1 |
> | 2 | -1 | 1 | 6B | NOT CACHED | NOT CACHED |
> TEXT | false |
> hdfs://localhost:20500/test-warehouse/part_test/partcol=2 |
> | 3 | 3 | 1 | 6B | NOT CACHED | NOT CACHED |
> TEXT | false |
> hdfs://localhost:20500/test-warehouse/part_test/partcol=3 |
> | Total | 1001000000 | 3 | 22B | 0B | |
> | |
> |
> +---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
> {code}
> The same method returns 1001000000.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]