JT should consider the disk each task is on before scheduling jobs...
---------------------------------------------------------------------

                 Key: HADOOP-2829
                 URL: https://issues.apache.org/jira/browse/HADOOP-2829
             Project: Hadoop Core
          Issue Type: Improvement
            Reporter: eric baldeschwieler


The DataNode can support a JBOD config, where blocks exist on explicit disks.  
But this information is not exported or considered by the JT when assigning 
tasks.  This leads to non-optimal disk use.  if 4 slots are used, 2 running 
tasks will likely be on the same disk and we observe them running more slowly 
then other tasks on the same machine.

We could follow a number of strategies to address this.

for example: The data nodes could support a what disk is this block on call.  
Then the JT could discover the info and assign jobs accordingly.

Of course the TT itself uses disks for merge and temp space and the datanodes 
on the same machine can be used by off node sources, so it is not clear 
optimizing all of this is simple enough to be worth it.

This issue deserves study.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to