[ 
https://issues.apache.org/jira/browse/HADOOP-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676028#action_12676028
 ] 

Doug Cutting commented on HADOOP-5299:
--------------------------------------

> The pread performance of hdfs is too low to be used in the shuffle.

Do you think this is an inherent problem or just a problem with the current 
pread implementation?  If, in the vast majority of the cases the blocks were 
local to the reduce node, then pread performance should not be terrible, should 
it?

> Clearly the goal is to remove the task tracker's attempts at managing local 
> store.

Yes.  So, if mappers still need to use local store then this is dead in the 
water. Unfortunately, as several have pointed out, map outputs are too numerous 
to reasonably store in HDFS.  So the question then becomes, can we arrange 
things so that map spills always happen to the net, rather than to local disk?  
This implies that map outputs would be pushed to reduce nodes rather than 
pulled, that maps could block if reduce nodes were busy, etc..  Big changes, 
sure, but possible?

> Reducer inputs should be spilled to HDFS rather than local disk.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-5299
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5299
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>         Environment: All
>            Reporter: Milind Bhandarkar
>
> Currently, both map outputs and reduce inputs are stored on local disks of 
> tasktrackers. (Un) Availability of local disk space for intermediate data is 
> seen as a major factor in job failures. 
> The suggested solution is to store these intermediate data on HDFS (maybe 
> with replication factor of 1). However, the main blocker issue with that 
> solution is that lots of temporary names (proportional to total number of 
> maps), can overwhelm the namenode, especially since the map outputs are 
> typically small (most produce one block output).
> Also, as we see in many applications, the map outputs can be estimated more 
> accurately, and thus users can plan accordingly, based on available local 
> disk space.
> However, the reduce input sizes can vary a lot, especially for skewed data 
> (or because of bad partitioning.)
> So, I suggest that it makes more sense to keep map outputs on local disks, 
> but the reduce inputs (when spilled from reducer memory) should go to HDFS.
> Adding a configuration variable to indicate the filesystem to be used for 
> reduce-side spills would let us experiment and compare the efficiency of this 
> new scheme.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to