Quanlong Huang created IMPALA-13103:
---------------------------------------

             Summary: Corrupt column stats are not reported
                 Key: IMPALA-13103
                 URL: https://issues.apache.org/jira/browse/IMPALA-13103
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
            Reporter: Quanlong Huang


Impala will report corrupt table stats in the query plan. However, corrupt 
column stats are not reported. For instance, consider the following table:
{code:sql}
create table t1 (id int, name string);
insert into t1 values (1, 'aaa'), (2, 'aaa'), (3, 'aaa'), (4, 'aaa');{code}
with the following stats:
{code:sql}
alter table t1 set tblproperties('numRows'='4');
alter table t1 set column stats name ('numNulls'='0');{code}
Note that column "id" has missing stats and column "name" has missing/corrupt 
stats (ndv=-1, numNulls=0).
Grouping by "id" will report the missing stats:
{code:sql}
explain select id, count(*) from t1 group by id;

WARNING: The following tables are missing relevant table and/or column 
statistics.
default.t1{code}
However, grouping by "name" doesn't report the missing/corrupt stats:
{noformat}
explain select name, count(*) from t1 group by name;
+-------------------------------------------------------------------------------------------+
| Explain String                                                                
            |
+-------------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=38.00MB Threads=2                   
            |
| Per-Host Resource Estimates: Memory=144MB                                     
            |
| Codegen disabled by planner                                                   
            |
| Analyzed query: SELECT name, count(*) FROM `default`.t1 GROUP BY name         
            |
|                                                                               
            |
| F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                         
            |
| |  Per-Host Resources: mem-estimate=144.00MB mem-reservation=38.00MB 
thread-reservation=2 |
| PLAN-ROOT SINK                                                                
            |
| |  output exprs: name, count(*)                                               
            |
| |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
thread-reservation=0    |
| |                                                                             
            |
| 01:AGGREGATE [FINALIZE]                                                       
            |
| |  output: count(*)                                                           
            |
| |  group by: name                                                             
            |
| |  mem-estimate=128.00MB mem-reservation=34.00MB spill-buffer=2.00MB 
thread-reservation=0 |
| |  tuple-ids=1 row-size=20B cardinality=4                                     
            |
| |  in pipelines: 01(GETNEXT), 00(OPEN)                                        
            |
| |                                                                             
            |
| 00:SCAN HDFS [default.t1]                                                     
            |
|    HDFS partitions=1/1 files=1 size=24B                                       
            |
|    stored statistics:                                                         
            |
|      table: rows=4 size=unavailable                                           
            |
|      columns: all                                                             
            |
|    extrapolated-rows=disabled max-scan-range-rows=4                           
            |
|    mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1           
            |
|    tuple-ids=0 row-size=12B cardinality=4                                     
            |
|    in pipelines: 00(GETNEXT)                                                  
            |
+-------------------------------------------------------------------------------------------+
{noformat}
CC [~rizaon]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to