Sahil Takiar created IMPALA-10084:
-------------------------------------
Summary: Display the number of estimated rows for a table
Key: IMPALA-10084
URL: https://issues.apache.org/jira/browse/IMPALA-10084
Project: IMPALA
Issue Type: Sub-task
Reporter: Sahil Takiar
AFAICT, there is no way to determine the number of rows estimated for a table
when row counts have been estimated via file size:
{code:java}
[localhost:21000] default> create table test (col int);
[localhost:21000] default> insert into table test values (1), (2), (3), (4),
(5);
[localhost:21000] default> show table stats test;
+-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------+
| #Rows | #Files | Size | Bytes Cached | Cache Replication | Format |
Incremental stats | Location |
+-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------+
| -1 | 1 | 10B | NOT CACHED | NOT CACHED | TEXT | false
| hdfs://localhost:20500/test-warehouse/test |
+-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------+
[localhost:21000] default> explain select * from test order by col limit 10;
+------------------------------------------------------------------------------------+
| Explain String
|
+------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=8.00KB Threads=3
|
| Per-Host Resource Estimates: Memory=32MB
|
| WARNING: The following tables are missing relevant table and/or column
statistics. |
| default.test
|
|
|
| PLAN-ROOT SINK
|
| |
|
| 02:MERGING-EXCHANGE [UNPARTITIONED]
|
| | order by: col ASC
|
| | limit: 10
|
| |
|
| 01:TOP-N [LIMIT=10]
|
| | order by: col ASC
|
| | row-size=4B cardinality=3
|
| |
|
| 00:SCAN HDFS [default.test]
|
| HDFS partitions=1/1 files=1 size=10B
|
| row-size=4B cardinality=3
|
+------------------------------------------------------------------------------------+
[localhost:21000] default> set explain_level=3;
localhost:21000] default> explain select * from test order by col limit 10;
+--------------------------------------------------------------------------------------+
| Explain String
|
+--------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=8.00KB Threads=3
|
| Per-Host Resource Estimates: Memory=32MB
|
| WARNING: The following tables are missing relevant table and/or column
statistics. |
| default.test
|
| Analyzed query: SELECT * FROM `default`.test ORDER BY col ASC LIMIT CAST(10
AS |
| TINYINT)
|
|
|
| F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|
| Per-Host Resources: mem-estimate=16.00KB mem-reservation=0B
thread-reservation=1 |
| PLAN-ROOT SINK
|
| | output exprs: col
|
| | mem-estimate=0B mem-reservation=0B thread-reservation=0
|
| |
|
| 02:MERGING-EXCHANGE [UNPARTITIONED]
|
| order by: col ASC
|
| limit: 10
|
| mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
|
| tuple-ids=1 row-size=4B cardinality=3
|
| in pipelines: 01(GETNEXT)
|
|
|
| F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
|
| Per-Host Resources: mem-estimate=32.00MB mem-reservation=8.00KB
thread-reservation=2 |
| DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=02, UNPARTITIONED]
|
| | mem-estimate=0B mem-reservation=0B thread-reservation=0
|
| 01:TOP-N [LIMIT=10]
|
| | order by: col ASC
|
| | mem-estimate=12B mem-reservation=0B thread-reservation=0
|
| | tuple-ids=1 row-size=4B cardinality=3
|
| | in pipelines: 01(GETNEXT), 00(OPEN)
|
| |
|
| 00:SCAN HDFS [default.test, RANDOM]
|
| HDFS partitions=1/1 files=1 size=10B
|
| stored statistics:
|
| table: rows=unavailable size=unavailable
|
| columns: unavailable
|
| extrapolated-rows=disabled max-scan-range-rows=unavailable
|
| mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1
|
| tuple-ids=0 row-size=4B cardinality=3
|
| in pipelines: 00(GETNEXT)
|
+--------------------------------------------------------------------------------------+
[localhost:21000] default> create table part_test (col int) partitioned by
(partcol int);
[localhost:21000] default> alter table part_test add partition (partcol=1);
[localhost:21000] default> insert into table part_test partition (partcol=1)
values (1), (2), (3), (4), (5);
[localhost:21000] default> show table stats part_test;
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| partcol | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format |
Incremental stats | Location |
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| 1 | -1 | 1 | 10B | NOT CACHED | NOT CACHED | TEXT |
false | hdfs://localhost:20500/test-warehouse/part_test/partcol=1 |
| Total | -1 | 1 | 10B | 0B | | |
| |
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
[localhost:21000] default> show partitions part_test;
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| partcol | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format |
Incremental stats | Location |
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| 1 | -1 | 1 | 10B | NOT CACHED | NOT CACHED | TEXT |
false | hdfs://localhost:20500/test-warehouse/part_test/partcol=1 |
| Total | -1 | 1 | 10B | 0B | | |
| |
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
[localhost:21000] default> explain select * from part_test order by col limit
10;
+--------------------------------------------------------------------------------------+
| Explain String
|
+--------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=8.00KB Threads=3
|
| Per-Host Resource Estimates: Memory=32MB
|
| WARNING: The following tables are missing relevant table and/or column
statistics. |
| default.part_test
|
| Analyzed query: SELECT * FROM `default`.part_test ORDER BY col ASC LIMIT
CAST(10 |
| AS TINYINT)
|
|
|
| F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|
| Per-Host Resources: mem-estimate=16.00KB mem-reservation=0B
thread-reservation=1 |
| PLAN-ROOT SINK
|
| | output exprs: col, partcol
|
| | mem-estimate=0B mem-reservation=0B thread-reservation=0
|
| |
|
| 02:MERGING-EXCHANGE [UNPARTITIONED]
|
| order by: col ASC
|
| limit: 10
|
| mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
|
| tuple-ids=1 row-size=8B cardinality=1
|
| in pipelines: 01(GETNEXT)
|
|
|
| F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
|
| Per-Host Resources: mem-estimate=32.00MB mem-reservation=8.00KB
thread-reservation=2 |
| DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=02, UNPARTITIONED]
|
| | mem-estimate=0B mem-reservation=0B thread-reservation=0
|
| 01:TOP-N [LIMIT=10]
|
| | order by: col ASC
|
| | mem-estimate=8B mem-reservation=0B thread-reservation=0
|
| | tuple-ids=1 row-size=8B cardinality=1
|
| | in pipelines: 01(GETNEXT), 00(OPEN)
|
| |
|
| 00:SCAN HDFS [default.part_test, RANDOM]
|
| HDFS partitions=1/1 files=1 size=10B
|
| stored statistics:
|
| table: rows=unavailable size=unavailable
|
| partitions: 0/1 rows=1
|
| columns missing stats: col
|
| extrapolated-rows=disabled max-scan-range-rows=unavailable
|
| mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1
|
| tuple-ids=0 row-size=8B cardinality=1
|
| in pipelines: 00(GETNEXT)
|
+--------------------------------------------------------------------------------------+
{code}
It would be nice if there was a field in the explain plan similar to
"extrapolated-rows" that lists the estimated row count for the table.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]