JT should consider the disk each task is on before scheduling jobs...
---------------------------------------------------------------------
Key: HADOOP-2829
URL: https://issues.apache.org/jira/browse/HADOOP-2829
Project: Hadoop Core
Issue Type: Improvement
Reporter: eric baldeschwieler
The DataNode can support a JBOD config, where blocks exist on explicit disks.
But this information is not exported or considered by the JT when assigning
tasks. This leads to non-optimal disk use. if 4 slots are used, 2 running
tasks will likely be on the same disk and we observe them running more slowly
then other tasks on the same machine.
We could follow a number of strategies to address this.
for example: The data nodes could support a what disk is this block on call.
Then the JT could discover the info and assign jobs accordingly.
Of course the TT itself uses disks for merge and temp space and the datanodes
on the same machine can be used by off node sources, so it is not clear
optimizing all of this is simple enough to be worth it.
This issue deserves study.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.