[ 
https://issues.apache.org/jira/browse/IMPALA-8021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated IMPALA-8021:
--------------------------------
    Description: 
The EXPLAIN output provides much useful information in the plan tree. All our 
planning decisions are based on cardinality; but it appears in the EXPLAIN 
output only for the EXTENDED level. The profile only contains the plan from the 
STANDARD level. This change proposes to include row size and cardinality even 
in the STANDARD level.

The nodes that have the information call it "cardinality", so continue to use 
that term.

Add cardinality to each node so it appears something like this:

{noformat}
HASH JOIN [INNER JOIN, BROADCAST]
|  row-size=89B cardinality=1.23G
|
|--SCAN HDFS [db.table]
|     partitions=2/123 files=2 size=4.56MB row-size=89B cardinality=7.89M
{noformat}

Cardinality should appear in all levels above MINIMAL. Cardinality is not 
needed for EXCHANGE since it can be inferred from other nodes.

Also, the existing code prints large cardinalities in detail: 1234567890, which 
is hard to read. Use the abbreviated output, using metric (power of 1000) 
units, so 1.23G instead.

  was:
The EXPLAIN output provides much useful information in the plan tree. But, it 
lacks a vital piece of information: the expected cardinality at each step. All 
our planning decisions are based on this cardinality; we need it to make sense 
of the generated plan.

Add cardinality to each node so it appears something like this:

{noformat}
HASH JOIN [INNER JOIN, BROADCAST]
|  rows=1.23G
|
|--SCAN HDFS [db.table]
|     partitions=2/123 files=2 size=4.56MB rows=7.89M
{noformat}

Only the scan and join have interesting cardinality numbers: Exchange nodes 
always have the same as their child. Aggregation may also be useful.

Cardinality should appear in all levels above MINIMAL.


> Add estimated cardinality to EXPLAIN output
> -------------------------------------------
>
>                 Key: IMPALA-8021
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8021
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>
> The EXPLAIN output provides much useful information in the plan tree. All our 
> planning decisions are based on cardinality; but it appears in the EXPLAIN 
> output only for the EXTENDED level. The profile only contains the plan from 
> the STANDARD level. This change proposes to include row size and cardinality 
> even in the STANDARD level.
> The nodes that have the information call it "cardinality", so continue to use 
> that term.
> Add cardinality to each node so it appears something like this:
> {noformat}
> HASH JOIN [INNER JOIN, BROADCAST]
> |  row-size=89B cardinality=1.23G
> |
> |--SCAN HDFS [db.table]
> |     partitions=2/123 files=2 size=4.56MB row-size=89B cardinality=7.89M
> {noformat}
> Cardinality should appear in all levels above MINIMAL. Cardinality is not 
> needed for EXCHANGE since it can be inferred from other nodes.
> Also, the existing code prints large cardinalities in detail: 1234567890, 
> which is hard to read. Use the abbreviated output, using metric (power of 
> 1000) units, so 1.23G instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to