[jira] Commented: (HADOOP-2774) Add counters to show number of key/values that have been sorted and merged in the maps and reduces

Owen O'Malley (JIRA) Sun, 26 Oct 2008 16:13:35 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642828#action_12642828
 ]


Owen O'Malley commented on HADOOP-2774:
---------------------------------------

Sure and in my proposal includes the records written as intermediates in the 
merge. So roughly it looks like:

case 1 = first level spill (10 m row writes) + second level (10 m row writes) + 
final write (10 m row writes) = 30 m
case 2 = first level (10 m row writes) + final write (10 m row writes) = 20 m

which shows the 2 versus 3 levels.

However, consider case 3 a map writes 500 spills of 20,000 records. It would be 
merged as:
second level: 5 pieces + 100 pieces + 100 pieces + 100 pieces + 100 pieces 
final level: 5 big pieces + 95 small pieces
total = first level (10 m) + second level (405 * 20k = 8.1m) + final write 
(10m) = 28.1 m

which is a much better indication of the performance than either levels (3) or 
first level spills (500).

> Add counters to show number of key/values that have been sorted and merged in 
> the maps and reduces
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2774
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2774
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Ravi Gummadi
>
> For each *pass* of the sort and merge, I would like a count of the number of 
> records. So for example, if the map output 100 records and they were sorted 
> once, the counter would be 100. If it spilled twice and was merged together, 
> it would be 200. Clearly in a multi-level merge, it may not be a multiple of 
> the number of map output records. This would let the users easily see if they 
> have values like io.sort.mb or io.sort.factor set too low.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2774) Add counters to show number of key/values that have been sorted and merged in the maps and reduces

Reply via email to