[ 
https://issues.apache.org/jira/browse/HADOOP-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616622#action_12616622
 ] 

Devaraj Das commented on HADOOP-2771:
-------------------------------------

Christian, how big are the map outputs? What's the value of io.sort.mb? This 
will give a rough estimate on the number of spills a map does. If there are 
multiple spills while the map is running, one suspect is that we are spending 
too much time in the final merge across the spill segments (merges to produce 
the final map output). We'd also do a lot of seeks during that merge. It'd be 
nice to remove the seeks, before the merge of the segments belonging to a 
certain partition, since we are ultimately reading the files sequentially 
anyway.

But again, the above depends on whether the map does multiple spills or not. 

> changing the number of reduces dramatically changes the time of the map time
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2771
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2771
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.1
>            Reporter: Owen O'Malley
>
> By changing the number of reduces, the time for an individual map changes 
> radically. By running the same program and data with different numbers of 
> reduces (2500, 7500, 25000) the times for each map changed radically (0:50, 
> 1:20, 5h).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to