[
https://issues.apache.org/jira/browse/IMPALA-9637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong resolved IMPALA-9637.
-----------------------------------
Resolution: Duplicate
> Scan range load-balancing within backend
> ----------------------------------------
>
> Key: IMPALA-9637
> URL: https://issues.apache.org/jira/browse/IMPALA-9637
> Project: IMPALA
> Issue Type: Improvement
> Components: Distributed Exec
> Affects Versions: Impala 4.0
> Reporter: Tim Armstrong
> Priority: Major
> Labels: multithreading, performance
>
> Currently the scheduler statically divides scan ranges between fragment
> instances, Since IMPALA-9015 it statically load-balances scan ranges based on
> file size using the LPT algorithm in the schedule.
> This has various pitfalls:
> * It interacts badly with dynamic partition pruning, which can filter out a
> bunch of scan ranges and unbalance the laod
> * Different files that have the same byte size may involve different amounts
> of work to process for any number of reasons.
> Those can cause both inter-node load balance problems and intra-node load
> balance problems. This Jira is about fixing the intra-node load balance
> problem, so that the situation is no worse than before mt_dop.
> The proposed solution is to have a queue of scan ranges per backend, sorted
> from largest to smallest, and have each instance pull scan ranges off that
> queue. The DiskIOMgr ReaderContext probably is already sufficient to solve
> this problem, and we'll need to add a different mechanism for Kudu, Hbase,
> etc.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)