[
https://issues.apache.org/jira/browse/HADOOP-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649611#action_12649611
]
Jothi Padmanabhan commented on HADOOP-2774:
-------------------------------------------
bq. Does the merger require two different counters? Isn't it sufficient to pass
an optional counter for each record the merger emits? The caller should have
enough context to know whether the segments are coming to/from disk or
memory... right?
The two counters are for counting the number of records read and the number of
records written, not for determining whether the records came from disk/memory.
Since Merger does not know whether a particular call is from Map or Reduce, we
need to specify both the couters and use the appropriate counter at the
Map/Reduce task level. Can we do without this?
> Add counters to show number of key/values that have been sorted and merged in
> the maps and reduces
> --------------------------------------------------------------------------------------------------
>
> Key: HADOOP-2774
> URL: https://issues.apache.org/jira/browse/HADOOP-2774
> Project: Hadoop Core
> Issue Type: Bug
> Reporter: Owen O'Malley
> Assignee: Ravi Gummadi
> Fix For: 0.20.0
>
> Attachments: HADOOP-2774.patch, HADOOP-2774.patch
>
>
> For each *pass* of the sort and merge, I would like a count of the number of
> records. So for example, if the map output 100 records and they were sorted
> once, the counter would be 100. If it spilled twice and was merged together,
> it would be 200. Clearly in a multi-level merge, it may not be a multiple of
> the number of map output records. This would let the users easily see if they
> have values like io.sort.mb or io.sort.factor set too low.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.