[
https://issues.apache.org/jira/browse/IMPALA-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710273#comment-16710273
]
Peter Ebert commented on IMPALA-2424:
-------------------------------------
This is becoming increasingly important for scaling and separation of storage
and compute. If impala is installed on a subset of nodes, or distinct compute
only nodes, remote reads would be essentially random and cross rack traffic may
become saturated, especially at large scale where network over-subscription is
common this could be a problem. With rack aware scheduling and proper
distribution of impala and storage nodes per rack, rack aware scheduling could
keep traffic within the TOR switches and improve performance.
> Rack-aware scheduling
> ---------------------
>
> Key: IMPALA-2424
> URL: https://issues.apache.org/jira/browse/IMPALA-2424
> Project: IMPALA
> Issue Type: Improvement
> Components: Distributed Exec
> Affects Versions: Impala 2.2.4
> Reporter: Marcel Kornacker
> Priority: Minor
> Labels: scalability, scheduling
>
> Currently, Impala makes an effort to schedule plan fragments local to the
> data that is being scanned; when no collocated impalad is available, the plan
> fragment is placed randomly.
> In order to support configurations where Impala is run on a subset of the
> nodes in a cluster, we should schedule fragments within the same rack that
> holds the assigned scan ranges (if a collocated impalad isn't available).
> See https://issues.apache.org/jira/browse/HADOOP-692 for details of how rack
> locality is recorded in hdfs.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]