[jira] Commented: (HADOOP-2774) Add counters to show number of key/values that have been sorted and merged in the maps and reduces

Runping Qi (JIRA) Sun, 26 Oct 2008 10:51:35 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642808#action_12642808
 ]


Runping Qi commented on HADOOP-2774:
------------------------------------


Owen, I think the number of spills is more important than you think.
Consider these two cases:

1. a task writes 1000 spills, each 10000 records, totalling 10M records.
2. a task writes 100 spills, each with 100000 records, totaling 10M records.

With mergeing factor being 100, the case 2 does not second level merge, while 
the case 2 does.
The case 1 requires 2x read/writes of the case 1. 


> Add counters to show number of key/values that have been sorted and merged in 
> the maps and reduces
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2774
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2774
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Ravi Gummadi
>
> For each *pass* of the sort and merge, I would like a count of the number of 
> records. So for example, if the map output 100 records and they were sorted 
> once, the counter would be 100. If it spilled twice and was merged together, 
> it would be 200. Clearly in a multi-level merge, it may not be a multiple of 
> the number of map output records. This would let the users easily see if they 
> have values like io.sort.mb or io.sort.factor set too low.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2774) Add counters to show number of key/values that have been sorted and merged in the maps and reduces

Reply via email to