[
https://issues.apache.org/jira/browse/HADOOP-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539753
]
eric baldeschwieler commented on HADOOP-2093:
---------------------------------------------
An easier solution might simply be to schedule more blocks to be read at once.
This will saturate the disk system with less complexity...
> DFS should provide partition information for blocks, and map/reduce should
> schedule avoid schedule mappers with the splits off the same file system
> partition at the same time
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-2093
> URL: https://issues.apache.org/jira/browse/HADOOP-2093
> Project: Hadoop
> Issue Type: New Feature
> Components: dfs, mapred
> Reporter: Runping Qi
>
> The summary is a bit of long. But the basic idea is to better utilize
> multiple file system partitions.
> For example, in a map reduce job, if we have 100 splits local to a node, and
> these 100 splits spread
> across 4 file system partitions, if we allow 4 mappers running concurrently,
> it is better that mappers
> each work on splits on different file system partitions. If in the worst
> case,
> all the mappers work on the splits on the same file system partition, then
> the other three
> file systems are not utilized at all.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.