[ 
https://issues.apache.org/jira/browse/HADOOP-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646889#action_12646889
 ] 

Sharad Agarwal commented on HADOOP-2774:
----------------------------------------

Instead of adding code to extract counter at all over the code, what if we 
intercept the append in IFile.Writer. I believe all intermediate record writes 
happen thru it. Something like this could be added in Task.java and 
MapTask/ReduceTask can use this to create Writer:

{code}
protected Writer createWriter(Configuration conf, FSDataOutputStream out, 
      Class keyClass, Class valueClass, CompressionCodec codec) throws 
IOException {
    return new Writer(conf, out, keyClass, valueClass, codec)  {
      public void append(Object key, Object value) throws IOException {
        super.append(key, value);
        spilledRecordsCounter.increment(1);
      }
      
      public void append(DataInputBuffer key, DataInputBuffer value)
      throws IOException {
        super.append(key, value);
        spilledRecordsCounter.increment(1);
      }
    };
  }
{code}

> Add counters to show number of key/values that have been sorted and merged in 
> the maps and reduces
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2774
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2774
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Ravi Gummadi
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-2774.patch
>
>
> For each *pass* of the sort and merge, I would like a count of the number of 
> records. So for example, if the map output 100 records and they were sorted 
> once, the counter would be 100. If it spilled twice and was merged together, 
> it would be 200. Clearly in a multi-level merge, it may not be a multiple of 
> the number of map output records. This would let the users easily see if they 
> have values like io.sort.mb or io.sort.factor set too low.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to