[
https://issues.apache.org/jira/browse/MAPREDUCE-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096747#comment-13096747
]
Harsh J commented on MAPREDUCE-2910:
------------------------------------
How much is the overhead of compressed, empty partition files?
> Allow empty MapOutputFile segments
> ----------------------------------
>
> Key: MAPREDUCE-2910
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2910
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: task, tasktracker
> Affects Versions: 0.20.2, 0.23.0
> Reporter: Binglin Chang
> Priority: Minor
> Fix For: 0.23.0
>
>
> As the scale of cluster and job get larger, we see a lot of empty partitions
> in MapOutputFile due to large reduce numbers or partition skew. When map
> output compression is enabled, empty map output partitions gets larger & has
> additional compressor/decompressor initialization overhead.
> This can be optimized by allowing empty MapOutputFile segments, where the
> rawLength & partLength of IndexRecord all equal to 0. Corresponding support
> need to be added to IFile reader, writer, and reduce shuffle copier.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira