Sahil Takiar created IMPALA-10084:
-------------------------------------

             Summary: Display the number of estimated rows for a table
                 Key: IMPALA-10084
                 URL: https://issues.apache.org/jira/browse/IMPALA-10084
             Project: IMPALA
          Issue Type: Sub-task
            Reporter: Sahil Takiar


AFAICT, there is no way to determine the number of rows estimated for a table 
when row counts have been estimated via file size:
{code:java}
[localhost:21000] default> create table test (col int);
[localhost:21000] default> insert into table test values (1), (2), (3), (4), 
(5);
[localhost:21000] default> show table stats test;
+-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------+
| #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | 
Incremental stats | Location                                   |
+-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------+
| -1    | 1      | 10B  | NOT CACHED   | NOT CACHED        | TEXT   | false     
        | hdfs://localhost:20500/test-warehouse/test |
+-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------+
[localhost:21000] default> explain select * from test order by col limit 10;
+------------------------------------------------------------------------------------+
| Explain String                                                                
     |
+------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=8.00KB Threads=3                    
     |
| Per-Host Resource Estimates: Memory=32MB                                      
     |
| WARNING: The following tables are missing relevant table and/or column 
statistics. |
| default.test                                                                  
     |
|                                                                               
     |
| PLAN-ROOT SINK                                                                
     |
| |                                                                             
     |
| 02:MERGING-EXCHANGE [UNPARTITIONED]                                           
     |
| |  order by: col ASC                                                          
     |
| |  limit: 10                                                                  
     |
| |                                                                             
     |
| 01:TOP-N [LIMIT=10]                                                           
     |
| |  order by: col ASC                                                          
     |
| |  row-size=4B cardinality=3                                                  
     |
| |                                                                             
     |
| 00:SCAN HDFS [default.test]                                                   
     |
|    HDFS partitions=1/1 files=1 size=10B                                       
     |
|    row-size=4B cardinality=3                                                  
     |
+------------------------------------------------------------------------------------+
[localhost:21000] default> set explain_level=3;
localhost:21000] default> explain select * from test order by col limit 10;
+--------------------------------------------------------------------------------------+
| Explain String                                                                
       |
+--------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=8.00KB Threads=3                    
       |
| Per-Host Resource Estimates: Memory=32MB                                      
       |
| WARNING: The following tables are missing relevant table and/or column 
statistics.   |
| default.test                                                                  
       |
| Analyzed query: SELECT * FROM `default`.test ORDER BY col ASC LIMIT CAST(10 
AS       |
| TINYINT)                                                                      
       |
|                                                                               
       |
| F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                         
       |
| Per-Host Resources: mem-estimate=16.00KB mem-reservation=0B 
thread-reservation=1     |
|   PLAN-ROOT SINK                                                              
       |
|   |  output exprs: col                                                        
       |
|   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                  
       |
|   |                                                                           
       |
|   02:MERGING-EXCHANGE [UNPARTITIONED]                                         
       |
|      order by: col ASC                                                        
       |
|      limit: 10                                                                
       |
|      mem-estimate=16.00KB mem-reservation=0B thread-reservation=0             
       |
|      tuple-ids=1 row-size=4B cardinality=3                                    
       |
|      in pipelines: 01(GETNEXT)                                                
       |
|                                                                               
       |
| F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1                                
       |
| Per-Host Resources: mem-estimate=32.00MB mem-reservation=8.00KB 
thread-reservation=2 |
|   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=02, UNPARTITIONED]                  
       |
|   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                  
       |
|   01:TOP-N [LIMIT=10]                                                         
       |
|   |  order by: col ASC                                                        
       |
|   |  mem-estimate=12B mem-reservation=0B thread-reservation=0                 
       |
|   |  tuple-ids=1 row-size=4B cardinality=3                                    
       |
|   |  in pipelines: 01(GETNEXT), 00(OPEN)                                      
       |
|   |                                                                           
       |
|   00:SCAN HDFS [default.test, RANDOM]                                         
       |
|      HDFS partitions=1/1 files=1 size=10B                                     
       |
|      stored statistics:                                                       
       |
|        table: rows=unavailable size=unavailable                               
       |
|        columns: unavailable                                                   
       |
|      extrapolated-rows=disabled max-scan-range-rows=unavailable               
       |
|      mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1         
       |
|      tuple-ids=0 row-size=4B cardinality=3                                    
       |
|      in pipelines: 00(GETNEXT)                                                
       |
+--------------------------------------------------------------------------------------+
[localhost:21000] default> create table part_test (col int) partitioned by 
(partcol int);
[localhost:21000] default> alter table part_test add partition (partcol=1);
[localhost:21000] default> insert into table part_test partition (partcol=1) 
values (1), (2), (3), (4), (5);
[localhost:21000] default> show table stats part_test;
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| partcol | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | 
Incremental stats | Location                                                  |
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| 1       | -1    | 1      | 10B  | NOT CACHED   | NOT CACHED        | TEXT   | 
false             | hdfs://localhost:20500/test-warehouse/part_test/partcol=1 |
| Total   | -1    | 1      | 10B  | 0B           |                   |        | 
                  |                                                           |
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
[localhost:21000] default> show partitions part_test;
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| partcol | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | 
Incremental stats | Location                                                  |
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| 1       | -1    | 1      | 10B  | NOT CACHED   | NOT CACHED        | TEXT   | 
false             | hdfs://localhost:20500/test-warehouse/part_test/partcol=1 |
| Total   | -1    | 1      | 10B  | 0B           |                   |        | 
                  |                                                           |
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
[localhost:21000] default> explain select * from part_test order by col limit 
10;
+--------------------------------------------------------------------------------------+
| Explain String                                                                
       |
+--------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=8.00KB Threads=3                    
       |
| Per-Host Resource Estimates: Memory=32MB                                      
       |
| WARNING: The following tables are missing relevant table and/or column 
statistics.   |
| default.part_test                                                             
       |
| Analyzed query: SELECT * FROM `default`.part_test ORDER BY col ASC LIMIT 
CAST(10     |
| AS TINYINT)                                                                   
       |
|                                                                               
       |
| F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                         
       |
| Per-Host Resources: mem-estimate=16.00KB mem-reservation=0B 
thread-reservation=1     |
|   PLAN-ROOT SINK                                                              
       |
|   |  output exprs: col, partcol                                               
       |
|   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                  
       |
|   |                                                                           
       |
|   02:MERGING-EXCHANGE [UNPARTITIONED]                                         
       |
|      order by: col ASC                                                        
       |
|      limit: 10                                                                
       |
|      mem-estimate=16.00KB mem-reservation=0B thread-reservation=0             
       |
|      tuple-ids=1 row-size=8B cardinality=1                                    
       |
|      in pipelines: 01(GETNEXT)                                                
       |
|                                                                               
       |
| F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1                                
       |
| Per-Host Resources: mem-estimate=32.00MB mem-reservation=8.00KB 
thread-reservation=2 |
|   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=02, UNPARTITIONED]                  
       |
|   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                  
       |
|   01:TOP-N [LIMIT=10]                                                         
       |
|   |  order by: col ASC                                                        
       |
|   |  mem-estimate=8B mem-reservation=0B thread-reservation=0                  
       |
|   |  tuple-ids=1 row-size=8B cardinality=1                                    
       |
|   |  in pipelines: 01(GETNEXT), 00(OPEN)                                      
       |
|   |                                                                           
       |
|   00:SCAN HDFS [default.part_test, RANDOM]                                    
       |
|      HDFS partitions=1/1 files=1 size=10B                                     
       |
|      stored statistics:                                                       
       |
|        table: rows=unavailable size=unavailable                               
       |
|        partitions: 0/1 rows=1                                                 
       |
|        columns missing stats: col                                             
       |
|      extrapolated-rows=disabled max-scan-range-rows=unavailable               
       |
|      mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1         
       |
|      tuple-ids=0 row-size=8B cardinality=1                                    
       |
|      in pipelines: 00(GETNEXT)                                                
       |
+--------------------------------------------------------------------------------------+
{code}

It would be nice if there was a field in the explain plan similar to 
"extrapolated-rows" that lists the estimated row count for the table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to