[ 
https://issues.apache.org/jira/browse/HADOOP-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642804#action_12642804
 ] 

Owen O'Malley commented on HADOOP-2774:
---------------------------------------

Ravi, my statement was about the map side. With no records spilled, each map 
will write each output record once. Since each record that is spilled before 
the final merge, is by definition read exactly once as it is merged into the 
next larger file. On the reduce side, the number of records written in the 
shuffle is exactly the same as the number of records read.

Runping, for the most part, the important piece of information is how many 
records we read and write to disk, since that is the major performance problem. 
For instance writing 10 files with 1m records each versus writing 2 files with 
5m records each will have very similar performance, because it is bound on the 
disk i/o time. Conversely, the difference between reading and writing 10m 
records versus 5m records would be substantial.

> Add counters to show number of key/values that have been sorted and merged in 
> the maps and reduces
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2774
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2774
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Ravi Gummadi
>
> For each *pass* of the sort and merge, I would like a count of the number of 
> records. So for example, if the map output 100 records and they were sorted 
> once, the counter would be 100. If it spilled twice and was merged together, 
> it would be 200. Clearly in a multi-level merge, it may not be a multiple of 
> the number of map output records. This would let the users easily see if they 
> have values like io.sort.mb or io.sort.factor set too low.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to