Providing BZip2 splitting support for Text data
-----------------------------------------------
Key: MAPREDUCE-830
URL: https://issues.apache.org/jira/browse/MAPREDUCE-830
Project: Hadoop Map/Reduce
Issue Type: Improvement
Affects Versions: 0.21.0
Reporter: Abdul Qadeer
Assignee: Abdul Qadeer
Fix For: 0.21.0
HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing
support to handle BZip2 compressed data such that the input compressed file is
split at arbitrary points. This JIRA uses that functionality in
LineRecordReader. The benefit of this work is that, if user provides
compressed BZip2 Text data, it will be split by Hadoop and hence will be
processed by multiple mappers. So BZip2 compressed data will be able to fully
utilize the cluster power. Currently BZip2 compressed Text file goes to one
mapper and is not split. So the enhancement in this JIRA provides splitting
support and a considerable performance gains.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.