[jira] [Commented] (IMPALA-12881) Use JoinNode.getFkPkJoinCardinality in reduceCardinalityForScanNode

ASF subversion and git services (Jira) Thu, 04 Apr 2024 13:54:03 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-12881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834076#comment-17834076
 ]


ASF subversion and git services commented on IMPALA-12881:
----------------------------------------------------------

Commit 97adba5192cff75f91255105bc871ec542f390de in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=97adba519 ]

IMPALA-12881: Use getFkPkJoinCardinality to reduce scan cardinality

IMPALA-12018 adds reduceCardinalityForScanNode to lower cardinality
estimation when a runtime filter is involved. It calls
JoinNode.computeGenericJoinCardinality(). However, if the originating
join node has FK-PK conjunct, it should be possible to obtain a lower
cardinality estimate by calling JoinNode.getFkPkJoinCardinality()
instead.

This patch adds that analysis and calls
JoinNode.getFkPkJoinCardinality() when possible. It is, however, only
limited to runtime filters that evaluate at the storage layer, such as
partition filter and pushed-down Kudu filter. Row-level runtime filters
that evaluate at scan node will continue using
JoinNode.computeGenericJoinCardinality().

This distinction is because a storage layer filter is applied more
consistently than a row-level filter. For example, a partition filter
evaluate all partition_id and never disabled regardless of its
precision (see HdfsScanNodeBase::PartitionPassesFilters). On the other
hand, scan node can disable a row-level filter later on if it is deemed
ineffective / not precise enough (see
HdfsScanner::CheckFiltersEffectiveness,
LocalFilterStats::enabled_for_row, and min_filter_reject_ratio flag).
For the pushed-down Kudu filter, Impala will rely on Kudu to evaluate
the filter.

Runtime filters can arrive late as well. But for both storage layer
filter and row-level filter, the scan node can stop waiting and start
scanning after runtime_filter_wait_time_ms passed. Scan node will still
evaluate a late runtime filter later on if the scan process is still
ongoing.

Also, note that this cardinality reduction algorithm is based only on
highly selective runtime filters to increase its estimate
confidence (see RuntimeFilter.isHighlySelective()).

Testing:
- Update TpcdsCpuCostPlannerTest.
- Pass FE tests.

Change-Id: I6efafffc8f96247a860b88e85d9097b2b4327f32
Reviewed-on: http://gerrit.cloudera.org:8080/21118
Reviewed-by: Wenzhe Zhou <[email protected]>
Reviewed-by: Michael Smith <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Use JoinNode.getFkPkJoinCardinality in reduceCardinalityForScanNode
> -------------------------------------------------------------------
>
>                 Key: IMPALA-12881
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12881
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Riza Suminto
>            Assignee: Riza Suminto
>            Priority: Major
>
> IMPALA-12018 adds reduceCardinalityForScanNode to lower cardinality 
> estimation when runtime filter is involved. It calls 
> JoinNode.computeGenericJoinCardinality(). However, if the originating join 
> node has FK-PK conjunct, it should be possible to obtain lower cardinality 
> estimate by calling JoinNode.getFkPkJoinCardinality() instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-12881) Use JoinNode.getFkPkJoinCardinality in reduceCardinalityForScanNode

Reply via email to