[jira] [Updated] (MAPREDUCE-6067) native-task: spilled records counter is incorrect

Binglin Chang (JIRA) Thu, 04 Sep 2014 00:09:09 -0700

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Binglin Chang updated MAPREDUCE-6067:
-------------------------------------
    Attachment: MAPREDUCE-6067.v2.patch

Update the patch, changes:
1. add counter support for MAP_OUTPUT_RECORDS, MAP_OUTPUT_BYTES, 
MAP_OUTPUT_MATERIALIZED_BYTES
2. add counter verification(only verify  MAP_OUTPUT_RECORDS and 
REDUCE_INPUT_GROUPS and REDUCE_INPUT_RECORDS), due to different serialization 
methods MAP_OUTPUT_BYTES is not the same so it is not verified, due to map 
output record order may not the same, and compression, 
MAP_OUTPUT_MATERIALIZED_BYTES also may not be the same.
3. update to TaskCounter from old API's Task$Counter
4. remove some of the unused counters
5. clean up some log


> native-task: spilled records counter is incorrect
> -------------------------------------------------
>
>                 Key: MAPREDUCE-6067
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6067
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: task
>            Reporter: Todd Lipcon
>            Assignee: Binglin Chang
>         Attachments: MAPREDUCE-6067.v1.patch, MAPREDUCE-6067.v2.patch, 
> native-counters.html, trunk-counters.html
>
>
> After running a terasort, I see the spilled records counter at 5028651606, 
> which is about half what I expected to see. Using the non-native collector I 
> see the expected count of 10000000000. It seems the correct number of records 
> were indeed spilled, because the job's output record count is correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6067) native-task: spilled records counter is incorrect

Reply via email to