[ 
https://issues.apache.org/jira/browse/IMPALA-8021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16741348#comment-16741348
 ] 

ASF subversion and git services commented on IMPALA-8021:
---------------------------------------------------------

Commit a7ea86b768247ff5388174445e7c91736b99c2de in impala's branch 
refs/heads/master from paul-rogers
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=a7ea86b ]

IMPALA-8021: Add estimated cardinality to EXPLAIN output

Cardinality is vital to understanding why a plan has the form it does,
yet the planner normally emits cardinality information only for the
detailed levels. Unfortunately, most query profiles we see are at the
standard level without this information (except in the summary table),
making it hard to understand what happened.

This patch adds cardinality to the standard EXPLAIN output. It also
changes the displayed cardinality value to be in abbreviated "metric"
form: 1.23K instead of 1234, etc.

Changing the DESCRIBE output has a huge impact on PlannerTest: all the
"golden" test files must change. To avoid doing this twice, this patch
also includes:

IMPALA-7919: Add predicates line in plan output for partition key
predicates

This is also the time to also include:

IMPALA-8022: Add cardinality checks to PlannerTest

The comparison code was changed to allow a set of validators, one of
which compares cardinality to ensure it is within 5% of the expected
value. This should ensure we don't change estimates unintentionally.

While many planner tests are concerned with cardinality, many others are
not. Testing showed that the cardinality is actually unstable within
tests. For such tests, added filters to ignore cardinality. The filter
is enabled by default (for backward compatibility) but disabled (to
allow cardinality verification) for the critical tests.

Rebasing the tests was complicated by a bug in the error-matching code,
so this patch also fixes:

IMPALA-8023: Fix PlannerTest to handle error lines consistently

Now, the error output written to the output "save results" file matches
that expected in the "golden" file -- no more handling these specially.

Testing:

* Added cardinality verification.
* Reran all FE tests.
* Rebased all PlannerTest .test files.
* Adjusted the metadata/test_explain.py test to handle the changed
  EXPLAIN output.

Change-Id: Ie9aa2d715b04cbb279aaffec8c5692686562d986
Reviewed-on: http://gerrit.cloudera.org:8080/12136
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Add estimated cardinality to EXPLAIN output
> -------------------------------------------
>
>                 Key: IMPALA-8021
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8021
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>
> The EXPLAIN output provides much useful information in the plan tree. All our 
> planning decisions are based on cardinality; but it appears in the EXPLAIN 
> output only for the EXTENDED level. The profile only contains the plan from 
> the STANDARD level. This change proposes to include row size and cardinality 
> even in the STANDARD level.
> The nodes that have the information call it "cardinality", so continue to use 
> that term.
> Add cardinality to each node so it appears something like this:
> {noformat}
> HASH JOIN [INNER JOIN, BROADCAST]
> |  row-size=89B cardinality=1.23G
> |
> |--SCAN HDFS [db.table]
> |     partitions=2/123 files=2 size=4.56MB row-size=89B cardinality=7.89M
> {noformat}
> Cardinality should appear in all levels above MINIMAL. Cardinality is not 
> needed for EXCHANGE since it can be inferred from other nodes.
> Also, the existing code prints large cardinalities in detail: 1234567890, 
> which is hard to read. Use the abbreviated output, using metric (power of 
> 1000) units, so 1.23G instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to