Sahil Takiar created IMPALA-10085:
-------------------------------------

             Summary: Table level stats are not honored when partition has 
corrupt stats
                 Key: IMPALA-10085
                 URL: https://issues.apache.org/jira/browse/IMPALA-10085
             Project: IMPALA
          Issue Type: Sub-task
            Reporter: Sahil Takiar


This is more of an edge case of IMPALA-9744, but when any partition in a table 
has corrupt stats, the table-level stats will not be honored. On the other 
hand, if a table just has missing stats, the table-level stats will be honored.

Given the a partitioned table with the following partitions and their row 
counts:

{code:java}
[localhost:21000] default> show partitions part_test;
Query: show partitions part_test
+---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| partcol | #Rows      | #Files | Size | Bytes Cached | Cache Replication | 
Format | Incremental stats | Location                                           
       |
+---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| 1       | -1         | 1      | 10B  | NOT CACHED   | NOT CACHED        | 
TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/part_test/partcol=1 |
| 2       | -438290    | 1      | 6B   | NOT CACHED   | NOT CACHED        | 
TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/part_test/partcol=2 |
| 3       | 3          | 1      | 6B   | NOT CACHED   | NOT CACHED        | 
TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/part_test/partcol=3 |
| Total   | 1001000000 | 3      | 22B  | 0B           |                   |     
   |                   |                                                        
   |
+---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
 {code}

The query {{explain select * from part_test order by col limit 10}} will cause 
{{HdfsScanNode#getStatsNumRows}} to return 5.

Given the following set of partitions with different row counts than above:

{code}
+---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| partcol | #Rows      | #Files | Size | Bytes Cached | Cache Replication | 
Format | Incremental stats | Location                                           
       |
+---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| 1       | -1         | 1      | 10B  | NOT CACHED   | NOT CACHED        | 
TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/part_test/partcol=1 |
| 2       | -1         | 1      | 6B   | NOT CACHED   | NOT CACHED        | 
TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/part_test/partcol=2 |
| 3       | 3          | 1      | 6B   | NOT CACHED   | NOT CACHED        | 
TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/part_test/partcol=3 |
| Total   | 1001000000 | 3      | 22B  | 0B           |                   |     
   |                   |                                                        
   |
+---------+------------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
{code}

The same method returns 1001000000.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to