[
https://issues.apache.org/jira/browse/IMPALA-12018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767048#comment-17767048
]
Csaba Ringhofer edited comment on IMPALA-12018 at 9/20/23 8:40 AM:
-------------------------------------------------------------------
> 1. Runtime filter arrived ontime, or it is guaranteed that scan node will
> need to wait for that runtime filter arrival (ie., join node right above the
> scan will not start pulling rows before its join build complete).
I was thinking about concept like "critical runtime filter" - we could
designate some filters as critical and wait for them indefinitely. Critical
filters could be calculated into the estimations. Meanwhile other filters would
be treated as optional and scanners nodes would have a timeout . Optional
filters would be ignored in resource estimates.
We could treat filters consumed by the probe side scanner as critical, as we
have to wait for all produces build nodes to complete anyway. Large build side
scanners could be also treated as critical if selectivity and fpp looks good.
>The second point can be tricky given that Impala's default
>RUNTIME_FILTER_ERROR_RATE == max_filter_error_rate == 0.75, and join build
>cardinality itself can be underestimated, leading to undersize bloom filter
We could use the concept of "hard estimates" introduced in
https://gerrit.cloudera.org/#/c/20366/ - there could be separate min/max limits
for filters produces by build sides with large estimates. We still need MAX
limit IMO to avoid problems with extreme sized filters, but it could be much
higher.
was (Author: csringhofer):
> 1. Runtime filter arrived ontime, or it is guaranteed that scan node will
> need to wait for that runtime filter arrival (ie., join node right above the
> scan will not start pulling rows before its join build complete).
I was thinking about concept like "critical runtime filter" - we could
designate some filters as critical and wait for them indefinitely. Critical
filters could be calculated into the estimations. Meanwhile other filters would
be treated as optional and scanners nodes would have a timeout . Optional
filters would be ignored in resource estimates.
We could treat filters consumed by the probe side scanner as critical, as we
have to wait for all produces build nodes to complete anyway. Large build side
scanners could be also treated as critical if selectivity and fpp looks good.
> Consider runtime filters in resource estimates
> ----------------------------------------------
>
> Key: IMPALA-12018
> URL: https://issues.apache.org/jira/browse/IMPALA-12018
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Reporter: Csaba Ringhofer
> Assignee: Riza Suminto
> Priority: Major
>
> Currently Impala creates a plan first and looks for runtime filters bases on
> the complete plan.
> IMPALA-3573 is about considering runtime filters during join ordering which
> would be a major change. Meanwhile it could be also useful to consider
> selective looking runtime filters in resource estimates without changing the
> plan topology.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]