[ 
https://issues.apache.org/jira/browse/HADOOP-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649290#action_12649290
 ] 

Ravi Gummadi commented on HADOOP-2774:
--------------------------------------

I see two approaches now:
(1) Pass a formal Counters object to the Ifile Reader/Writer and Merger.merge 
APIs. This would work I think, but the only problem, as Devaraj pointed out is 
that we are making Ifile and Merger classes dependent on a core MapReduce 
Counters feature. The Ifile/Merger classes currently does mostly IO related 
stuff and knows nothing about mapred.Counters/Tasks,etc.

(2) The other approach is to have a callback kind of a mechanism. Define an 
interface like IfileDiskOperationsMonitor with two methods 
recordsReadFromDisk(long num) and recordsWrittenToDisk(long num). The 
Ifile.Reader could invoke
IFileDiskOperationsMonitor.recordsReadFromDisk(numRead) whenever
Reader.close() is called. The Task class could implement the interface and 
could update its own copy of the relevant Counter then. Similarly for 
Ifile.Writer (invokes 
IFileDiskOperationsMonitor.recordsWrittenToDisk(numWritten)).
The Ifile.Reader,Writer and Merger classes could have an additional argument 
for IFileDiskOperationsMonitor in the respective constructors (or could have 
setters in the classes). The argument is the Task object itself since that 
implements the interface.
This seems fairly generic and in the future could be used by any other 
potential user of Ifile/Merger classes (outside MapReduce)..

Thoughts?

> Add counters to show number of key/values that have been sorted and merged in 
> the maps and reduces
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2774
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2774
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Ravi Gummadi
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-2774.patch, HADOOP-2774.patch
>
>
> For each *pass* of the sort and merge, I would like a count of the number of 
> records. So for example, if the map output 100 records and they were sorted 
> once, the counter would be 100. If it spilled twice and was merged together, 
> it would be 200. Clearly in a multi-level merge, it may not be a multiple of 
> the number of map output records. This would let the users easily see if they 
> have values like io.sort.mb or io.sort.factor set too low.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to