[jira] [Commented] (IMPALA-11647) Row size for source tables in a cross join query is set to 0 in query plan

Qifan Chen (Jira) Mon, 10 Oct 2022 12:42:03 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-11647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17615283#comment-17615283
 ]


Qifan Chen commented on IMPALA-11647:
-------------------------------------

The output width from the scan being 0B instead of 8B is due to this line of 
code: 
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/ScanNode.java#L160.
Once the restriction is relaxed, we can get a better plan, where the row size 
is 8B and the # of rows is the # of files in the table. 



> Row size for source tables in a cross join query is set to 0 in query plan
> --------------------------------------------------------------------------
>
>                 Key: IMPALA-11647
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11647
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Qifan Chen
>            Priority: Major
>
> The row-size in the following explain output for both source tables is set to 
> 0B.  On paper, it is possible to apply the count star optimization for such 
> queries and therefore set the row-size correctly. 
> {code:java}
> explain select count(*) from store_sales a, store_sales b limit 500
> +--------------------------------------------------------------+
> | Explain String                                               |
> +--------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=256.00KB Threads=5 |
> | Per-Host Resource Estimates: Memory=10MB                     |
> |                                                              |
> | PLAN-ROOT SINK                                               |
> | |                                                            |
> | 06:AGGREGATE [FINALIZE]                                      |
> | |  output: count:merge(*)                                    |
> | |  limit: 500                                                |
> | |  row-size=8B cardinality=1                                 |
> | |                                                            |
> | 05:EXCHANGE [UNPARTITIONED]                                  |
> | |                                                            |
> | 03:AGGREGATE                                                 |
> | |  output: count(*)                                          |
> | |  row-size=8B cardinality=1                                 |
> | |                                                            |
> | 02:NESTED LOOP JOIN [CROSS JOIN, BROADCAST]                  |
> | |  row-size=0B cardinality=8.30T                             |
> | |                                                            |
> | |--04:EXCHANGE [BROADCAST]                                   |
> | |  |                                                         |
> | |  01:SCAN HDFS [tpcds_parquet.store_sales b]                |
> | |     HDFS partitions=1824/1824 files=1824 size=199.83MB     |
> | |     row-size=0B cardinality=2.88M                          |
> | |                                                            |
> | 00:SCAN HDFS [tpcds_parquet.store_sales a]                   |
> |    HDFS partitions=1824/1824 files=1824 size=199.83MB        |
> |    row-size=0B cardinality=2.88M                             |
> +--------------------------------------------------------------+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-11647) Row size for source tables in a cross join query is set to 0 in query plan

Reply via email to