David Rorke created IMPALA-13645:
------------------------------------
Summary: Account for impact of runtime filters when scheduling
scan ranges
Key: IMPALA-13645
URL: https://issues.apache.org/jira/browse/IMPALA-13645
Project: IMPALA
Issue Type: Improvement
Components: Backend
Reporter: David Rorke
Runtime filters can introduce significant skew in the number of scan ranges
scanned by a given executor or scan fragment. Initial scan range scheduling
attempts to balance ranges across hosts and with multithreading (e.g. mt_dop>0)
there's additional balancing done locally within a given executor, but neither
of these balancing steps account for the impact of runtime filters.
One option for remote, partition filters might be to wait for the filters to
arrive at the coordinator, apply the filters prior to scheduling, and then
balance the surviving scan ranges during scheduling. Another more limited
approach would be to only use the filters for balancing across fragments within
a given executor (wait for the filter to arrive and then prune the ranges
assigned to that executor and hand them out to fragments in a deterministic
way).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)