[
https://issues.apache.org/jira/browse/IMPALA-12018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798597#comment-17798597
]
ASF subversion and git services commented on IMPALA-12018:
----------------------------------------------------------
Commit b37a35aa139ff61a1f93a54a9902ea76a86cbe1d in impala's branch
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b37a35aa1 ]
IMPALA-12018: Consider runtime filter for cardinality reduction
Currently, Impala creates a plan first and looks for runtime filters
based on the complete plan. This means the cardinality estimate in the
query plan does not incorporate runtime filter selectivity. Actual scan
cardinality from runtime execution is often much lower than the
cardinality estimate due to the existence of runtime filters.
This patch applies runtime filter selectivity to lower cardinality
estimates of scan nodes and certain join nodes above them after runtime
filter generation and before resource requirement computation. The
algorithm selects a contiguous probe pipeline consisting of a scan node,
exchanges, and reducing join nodes. Depending on whether the join node
produces a runtime filter and the type of that runtime filter, it then
applies the runtime filter selectivity to the scan node to reduce its
cardinality and input cardinality estimate. The runtime filter
selectivity is calculated with the simplest join cardinality
formula (JoinNode.computeGenericJoinCardinality()).
The reduced cardinality is stored in new fields 'filteredCardinality_'
and 'filteredInputCardinality_', separate from existing fields
'cardinality_' and 'inputCardinality_'. Future work should merge the new
cardinality fields with the old cardinality fields after we can validate
that the cardinality reduction does not regress memory estimation.
While this cardinality reduction is present in all execution
modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based
planning mode will be the primary beneficiary of this patch. It can lead
toward ProcessingCost reduction, lower scan fragment parallelism, lower
CpuAsk, and increase the chance of query assignment to the smaller
executor group set. Other execution modes will see no change in their
execution parallelism or memory estimates.
This patch also adds development query option named
RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE, a range of [0.0..1.0] that
controls the cardinality reduction scale from runtime filter analysis to
help with benchmarking and disabling cardinality reduction if needed (by
setting to 0.0). Default to 1.0.
Testing:
- Add fe test testRuntimeFilterCardinalityReduction and
testRuntimeFilterCardinalityReductionOnKudu
- Pass test_executor_groups.py.
- Pass PlannerTest#testProcessingCost.
- Add be test QueryOptions.SetFractionalOptions
- Ran full TPC-DS 3TB benchmark and see no regression due to
query plan change.
- Pass core tests.
Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Reviewed-on: http://gerrit.cloudera.org:8080/20498
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Consider runtime filters in resource estimates
> ----------------------------------------------
>
> Key: IMPALA-12018
> URL: https://issues.apache.org/jira/browse/IMPALA-12018
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Reporter: Csaba Ringhofer
> Assignee: Riza Suminto
> Priority: Major
>
> Currently Impala creates a plan first and looks for runtime filters bases on
> the complete plan.
> IMPALA-3573 is about considering runtime filters during join ordering which
> would be a major change. Meanwhile it could be also useful to consider
> selective looking runtime filters in resource estimates without changing the
> plan topology.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]