Sahil Takiar created IMPALA-10085:
-------------------------------------
Summary: Table level stats are not honored when partition has
corrupt stats
Key: IMPALA-10085
URL: https://issues.apache.org/jira/browse/IMPALA-10085
Project: IMPALA
Issue Type: Sub-task
Reporter: Sahil Takiar
This is more of an edge case of IMPALA-9744, but when any partition in a table
has corrupt stats, the table-level stats will not be honored. On the other
hand, if a table just has missing stats, the table-level stats will be honored.
Given the a partitioned table with the following partitions and their row
counts:
{code:java}
[localhost:21000] default> show partitions part_test;
Query: show partitions part_test
+---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| partcol | #Rows | #Files | Size | Bytes Cached | Cache Replication |
Format | Incremental stats | Location
|
+---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| 1 | -1 | 1 | 10B | NOT CACHED | NOT CACHED |
TEXT | false |
hdfs://localhost:20500/test-warehouse/part_test/partcol=1 |
| 2 | -438290 | 1 | 6B | NOT CACHED | NOT CACHED |
TEXT | false |
hdfs://localhost:20500/test-warehouse/part_test/partcol=2 |
| 3 | 3 | 1 | 6B | NOT CACHED | NOT CACHED |
TEXT | false |
hdfs://localhost:20500/test-warehouse/part_test/partcol=3 |
| Total | 1001000000 | 3 | 22B | 0B | |
| |
|
+---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
{code}
The query {{explain select * from part_test order by col limit 10}} will cause
{{HdfsScanNode#getStatsNumRows}} to return 5.
Given the following set of partitions with different row counts than above:
{code}
+---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| partcol | #Rows | #Files | Size | Bytes Cached | Cache Replication |
Format | Incremental stats | Location
|
+---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| 1 | -1 | 1 | 10B | NOT CACHED | NOT CACHED |
TEXT | false |
hdfs://localhost:20500/test-warehouse/part_test/partcol=1 |
| 2 | -1 | 1 | 6B | NOT CACHED | NOT CACHED |
TEXT | false |
hdfs://localhost:20500/test-warehouse/part_test/partcol=2 |
| 3 | 3 | 1 | 6B | NOT CACHED | NOT CACHED |
TEXT | false |
hdfs://localhost:20500/test-warehouse/part_test/partcol=3 |
| Total | 1001000000 | 3 | 22B | 0B | |
| |
|
+---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
{code}
The same method returns 1001000000.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]