[
https://issues.apache.org/jira/browse/IMPALA-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867447#comment-17867447
]
David Rorke commented on IMPALA-12358:
--------------------------------------
We should be able to develop a general costing model for the filters that takes
into account various factors including:
* The expected selectivity of the filter (becomes trickier for cascading
filters)
* Expected reduction of scanning time based on the selectivity (some of the
recent changes for compute processing cost, e.g. IMPALA-12657 already attempt
to do scan cost predictions by combining cardinality estimates with baseline
benchmark results)
We could use this "expected filter benefit" both to determine the set of
filters to generate at planning time and also the amount of time we should wait
for an individual filter (we might still schedule a filter that's only expected
to reduce scan time by 1 second but we definitely wouldn't wait 10 seconds for
it).
> Skip scheduling runtime filter that unlikely to meet
> RUNTIME_FILTER_WAIT_TIME_MS constraint
> -------------------------------------------------------------------------------------------
>
> Key: IMPALA-12358
> URL: https://issues.apache.org/jira/browse/IMPALA-12358
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Reporter: Riza Suminto
> Priority: Major
> Labels: runtime-filters
>
> A scan node will wait for a runtime filter arrival for
> [RUNTIME_FILTER_WAIT_TIME_MS|https://impala.apache.org/docs/build/html/topics/impala_runtime_filter_wait_time_ms.html].
> If the runtime filter is not arrived within this period, the scan node will
> stop waiting and start scanning. Late runtime filter can still be applied if
> the scanning has not finished yet upon filter arrival. However, planner
> should predict to some degree if a certain runtime filter is likely to be
> late. If it does, then skip scheduling such a filter.
> The prediction can be based on either of:
> * total volume of all scans from the build side of join.
> * the distance between the join node and the furthest fragment in the build
> side direction.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]