[
https://issues.apache.org/jira/browse/HADOOP-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649290#action_12649290
]
Ravi Gummadi commented on HADOOP-2774:
--------------------------------------
I see two approaches now:
(1) Pass a formal Counters object to the Ifile Reader/Writer and Merger.merge
APIs. This would work I think, but the only problem, as Devaraj pointed out is
that we are making Ifile and Merger classes dependent on a core MapReduce
Counters feature. The Ifile/Merger classes currently does mostly IO related
stuff and knows nothing about mapred.Counters/Tasks,etc.
(2) The other approach is to have a callback kind of a mechanism. Define an
interface like IfileDiskOperationsMonitor with two methods
recordsReadFromDisk(long num) and recordsWrittenToDisk(long num). The
Ifile.Reader could invoke
IFileDiskOperationsMonitor.recordsReadFromDisk(numRead) whenever
Reader.close() is called. The Task class could implement the interface and
could update its own copy of the relevant Counter then. Similarly for
Ifile.Writer (invokes
IFileDiskOperationsMonitor.recordsWrittenToDisk(numWritten)).
The Ifile.Reader,Writer and Merger classes could have an additional argument
for IFileDiskOperationsMonitor in the respective constructors (or could have
setters in the classes). The argument is the Task object itself since that
implements the interface.
This seems fairly generic and in the future could be used by any other
potential user of Ifile/Merger classes (outside MapReduce)..
Thoughts?
> Add counters to show number of key/values that have been sorted and merged in
> the maps and reduces
> --------------------------------------------------------------------------------------------------
>
> Key: HADOOP-2774
> URL: https://issues.apache.org/jira/browse/HADOOP-2774
> Project: Hadoop Core
> Issue Type: Bug
> Reporter: Owen O'Malley
> Assignee: Ravi Gummadi
> Fix For: 0.20.0
>
> Attachments: HADOOP-2774.patch, HADOOP-2774.patch
>
>
> For each *pass* of the sort and merge, I would like a count of the number of
> records. So for example, if the map output 100 records and they were sorted
> once, the counter would be 100. If it spilled twice and was merged together,
> it would be 200. Clearly in a multi-level merge, it may not be a multiple of
> the number of map output records. This would let the users easily see if they
> have values like io.sort.mb or io.sort.factor set too low.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.