[
https://issues.apache.org/jira/browse/HADOOP-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ravi Gummadi updated HADOOP-2774:
---------------------------------
Attachment: HADOOP-2774.patch
New patch attached here.
As Sharad pointed out, since most of the writes to disk happen through
IFile.Writer.append(), this new code maintains a count of records written.
I verified the spilled records count with wordcount example and test inputs
with different values of io.sort.mb, io.sort.factor and different number of
maps & reduces.
Please do the review of code changes.
> Add counters to show number of key/values that have been sorted and merged in
> the maps and reduces
> --------------------------------------------------------------------------------------------------
>
> Key: HADOOP-2774
> URL: https://issues.apache.org/jira/browse/HADOOP-2774
> Project: Hadoop Core
> Issue Type: Bug
> Reporter: Owen O'Malley
> Assignee: Ravi Gummadi
> Fix For: 0.20.0
>
> Attachments: HADOOP-2774.patch, HADOOP-2774.patch
>
>
> For each *pass* of the sort and merge, I would like a count of the number of
> records. So for example, if the map output 100 records and they were sorted
> once, the counter would be 100. If it spilled twice and was merged together,
> it would be 200. Clearly in a multi-level merge, it may not be a multiple of
> the number of map output records. This would let the users easily see if they
> have values like io.sort.mb or io.sort.factor set too low.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.