[
https://issues.apache.org/jira/browse/MAPREDUCE-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Haibo Chen reassigned MAPREDUCE-6631:
-------------------------------------
Assignee: Haibo Chen
> shuffle handler would benefit from per-local-dir threads
> --------------------------------------------------------
>
> Key: MAPREDUCE-6631
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6631
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: nodemanager
> Affects Versions: 2.7.2, 3.0.0-alpha1
> Reporter: Nathan Roberts
> Assignee: Haibo Chen
>
> [~jlowe] and I discussed this while investigating I/O starvation we have been
> seeing on our clusters lately (possibly amplified by increased tez
> workloads).
> If a particular disk is being slow, it is very likely that all shuffle netty
> threads will be blocked on the read side of sendfile(). (sendfile() is
> asynchronous on the outbound socket side, but not on the read side.) This
> causes the entire shuffle subsystem to slow down.
> It seems like we could make the netty threads more asynchronous by
> introducing a small set of threads per local-dir that are responsible for the
> actual sendfile() invocations.
> This would not only improve shuffles that span drives, but also improve
> situations where there is a single large shuffle from a single local-dir. It
> would allow other drives to continue serving shuffle requests, AND avoid a
> large number of readers (2X number_of_cores by default) all fighting for the
> same drive, which becomes unfair to everything else on the system.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]