[
https://issues.apache.org/jira/browse/IMPALA-13645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17933430#comment-17933430
]
David Rorke commented on IMPALA-13645:
--------------------------------------
After some discussion with [~joemcdonnell] another potential solution would be
some form of work stealing of scan ranges across executors.
> Account for impact of runtime filters when scheduling scan ranges
> -----------------------------------------------------------------
>
> Key: IMPALA-13645
> URL: https://issues.apache.org/jira/browse/IMPALA-13645
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Reporter: David Rorke
> Priority: Major
>
> Runtime filters can introduce significant skew in the number of scan ranges
> scanned by a given executor or scan fragment. Initial scan range scheduling
> attempts to balance ranges across hosts and with multithreading (e.g.
> mt_dop>0) there's additional balancing done locally within a given executor,
> but neither of these balancing steps account for the impact of runtime
> filters.
> One option for remote, partition filters might be to wait for the filters to
> arrive at the coordinator, apply the filters prior to scheduling, and then
> balance the surviving scan ranges during scheduling. Another more limited
> approach would be to only use the filters for balancing across fragments
> within a given executor (wait for the filter to arrive and then prune the
> ranges assigned to that executor and hand them out to fragments in a
> deterministic way).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]