[jira] [Resolved] (IMPALA-9637) Scan range load-balancing within backend

Tim Armstrong (Jira) Tue, 15 Dec 2020 11:55:04 -0800


     [ 
https://issues.apache.org/jira/browse/IMPALA-9637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tim Armstrong resolved IMPALA-9637.
-----------------------------------
    Resolution: Duplicate

> Scan range load-balancing within backend
> ----------------------------------------
>
>                 Key: IMPALA-9637
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9637
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Distributed Exec
>    Affects Versions: Impala 4.0
>            Reporter: Tim Armstrong
>            Priority: Major
>              Labels: multithreading, performance
>
> Currently the scheduler statically divides scan ranges between fragment 
> instances, Since IMPALA-9015 it statically load-balances scan ranges based on 
> file size using the LPT algorithm in the schedule.
> This has various pitfalls:
>  * It interacts badly with dynamic partition pruning, which can filter out a 
> bunch of scan ranges and unbalance the laod
>  * Different files that have the same byte size may involve different amounts 
> of work to process for any number of reasons.
> Those can cause both inter-node load balance problems and intra-node load 
> balance problems. This Jira is about fixing the intra-node load balance 
> problem, so that the situation is no worse than before mt_dop.
> The proposed solution is to have a queue of scan ranges per backend, sorted 
> from largest to smallest, and have each instance pull scan ranges off that 
> queue. The DiskIOMgr ReaderContext probably is already sufficient to solve 
> this problem, and we'll need to add a different mechanism for Kudu, Hbase, 
> etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (IMPALA-9637) Scan range load-balancing within backend

Reply via email to