[
https://issues.apache.org/jira/browse/MAPREDUCE-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405406#comment-13405406
]
Ankit Kamboj commented on MAPREDUCE-4354:
-----------------------------------------
Thanks!
I performed tests with this patch by running hive query on same dataset.
Following are results of the latest test:
Map execution times:
1. Without patch: 9.22 mins
2. With patch: 6.42 mins (43% less than without patch)
The overall (map + reduce) execution time:
1. Without patch: 14.61 mins
2. With patch: 11.85 mins (23% less than without patch)
> Performance improvement with compressor object reinit restriction
> -----------------------------------------------------------------
>
> Key: MAPREDUCE-4354
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4354
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: performance
> Affects Versions: 0.20.205.0
> Reporter: Ankit Kamboj
> Priority: Minor
> Labels: performance
> Fix For: 0.20.205.0
>
> Attachments: codec_reinit_diff, modify_lzo_codec_reinit
>
>
> HADOOP-5879 patch aimed at picking the conf (instead of default) settings for
> GzipCodec. It also involved re-initializing the recycled compressor object.
> On our performance tests, this re-initialization led to performance
> degradation of 15% for LzoCodec because re-initialization for Lzo involves
> reallocation of buffers. LzoCodec takes the initial settings from config so
> it is not necessary to re-initialize it. This patch checks for the codec
> class and calls reinit only if the codec class is Gzip. This led to
> significant performance improvement of 15% for LzoCodec.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira