[jira] Commented: (HADOOP-4845) Shuffle counter issues

Chris Douglas (JIRA) Thu, 11 Dec 2008 19:05:10 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655869#action_12655869
 ]


Chris Douglas commented on HADOOP-4845:
---------------------------------------

bq. Am I right in that the counter is the number of bytes that is fed into 
"reduce task"? For "reduce function" we only have a concept of records not 
bytes, since "reduce function" accepts objects not byte arrays/streams.
Good point, but an existing counter "map output bytes" doesn't follow these 
semantics. It counts serialized bytes of records out of the map function, not 
the bytes output from MapTask. It seems confusing to accept different 
terminology for the reduce side.

bq. by shuffled bytes, you mean the number of recourds mappers output to a 
specific reducer or to all reducers? 
No, I mean the value this counter is half-tracking, i.e. the number of bytes 
fetched from all completed maps by the reduce.

As implemented, I'm not sure the counter will be correct when intermediate 
compression is on and a map output is too large to fetch into memory. When 
fetched into memory, the counter will be incremented by the size of the 
decompressed segment. When fetched to disk, it will be incremented by the 
compressed size.

> Shuffle counter issues
> ----------------------
>
>                 Key: HADOOP-4845
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4845
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: Chris Douglas
>             Fix For: 0.20.0
>
>
> HADOOP-4749 added a new counter tracking the bytes shuffled into the reduce. 
> It adds an accumulator to ReduceCopier instead of simply incrementing the new 
> counter and did not define a human-readable value in 
> src/mapred/org/apache/hadoop/mapred/Task_Counter.properties.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4845) Shuffle counter issues

Reply via email to