[
https://issues.apache.org/jira/browse/MAPREDUCE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Binglin Chang updated MAPREDUCE-6067:
-------------------------------------
Attachment: MAPREDUCE-6067.v2.patch
Update the patch, changes:
1. add counter support for MAP_OUTPUT_RECORDS, MAP_OUTPUT_BYTES,
MAP_OUTPUT_MATERIALIZED_BYTES
2. add counter verification(only verify MAP_OUTPUT_RECORDS and
REDUCE_INPUT_GROUPS and REDUCE_INPUT_RECORDS), due to different serialization
methods MAP_OUTPUT_BYTES is not the same so it is not verified, due to map
output record order may not the same, and compression,
MAP_OUTPUT_MATERIALIZED_BYTES also may not be the same.
3. update to TaskCounter from old API's Task$Counter
4. remove some of the unused counters
5. clean up some log
> native-task: spilled records counter is incorrect
> -------------------------------------------------
>
> Key: MAPREDUCE-6067
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6067
> Project: Hadoop Map/Reduce
> Issue Type: Sub-task
> Components: task
> Reporter: Todd Lipcon
> Assignee: Binglin Chang
> Attachments: MAPREDUCE-6067.v1.patch, MAPREDUCE-6067.v2.patch,
> native-counters.html, trunk-counters.html
>
>
> After running a terasort, I see the spilled records counter at 5028651606,
> which is about half what I expected to see. Using the non-native collector I
> see the expected count of 10000000000. It seems the correct number of records
> were indeed spilled, because the job's output record count is correct.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)