[ 
https://issues.apache.org/jira/browse/IMPALA-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867447#comment-17867447
 ] 

David Rorke commented on IMPALA-12358:
--------------------------------------

We should be able to develop a general costing model for the filters that takes 
into account various factors including:
 * The expected selectivity of the filter (becomes trickier for cascading 
filters)
 * Expected reduction of scanning time based on the selectivity (some of the 
recent changes for compute processing cost, e.g. IMPALA-12657 already attempt 
to do scan cost predictions by combining cardinality estimates with baseline 
benchmark results)

We could use this "expected filter benefit" both to determine the set of 
filters to generate at planning time and also the amount of time we should wait 
for an individual filter (we might still schedule a filter that's only expected 
to reduce scan time by 1 second but we definitely wouldn't wait 10 seconds for 
it).

> Skip scheduling runtime filter that unlikely to meet 
> RUNTIME_FILTER_WAIT_TIME_MS constraint
> -------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-12358
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12358
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Riza Suminto
>            Priority: Major
>              Labels: runtime-filters
>
> A scan node will wait for a runtime filter arrival for 
> [RUNTIME_FILTER_WAIT_TIME_MS|https://impala.apache.org/docs/build/html/topics/impala_runtime_filter_wait_time_ms.html].
>  If the runtime filter is not arrived within this period, the scan node will 
> stop waiting and start scanning. Late runtime filter can still be applied if 
> the scanning has not finished yet upon filter arrival. However, planner 
> should predict to some degree if a certain runtime filter is likely to be 
> late. If it does, then skip scheduling such a filter.
> The prediction can be based on either of:
>  * total volume of all scans from the build side of join.
>  * the distance between the join node and the furthest fragment in the build 
> side direction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to