David Rorke created IMPALA-13645:
------------------------------------

             Summary: Account for impact of runtime filters when scheduling 
scan ranges
                 Key: IMPALA-13645
                 URL: https://issues.apache.org/jira/browse/IMPALA-13645
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
            Reporter: David Rorke


Runtime filters can introduce significant skew in the number of scan ranges 
scanned by a given executor or scan fragment.  Initial scan range scheduling 
attempts to balance ranges across hosts and with multithreading (e.g. mt_dop>0) 
there's additional balancing done locally within a given executor, but neither 
of these balancing steps account for the impact of runtime filters.

One option for remote, partition filters might be to wait for the filters to 
arrive at the coordinator, apply the filters prior to scheduling, and then 
balance the surviving scan ranges during scheduling.  Another more limited 
approach would be to only use the filters for balancing across fragments within 
a given executor (wait for the filter to arrive and then prune the ranges 
assigned to that executor and hand them out to fragments in a deterministic 
way).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to