Providing BZip2 splitting support for Text data
-----------------------------------------------

                 Key: MAPREDUCE-830
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-830
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
    Affects Versions: 0.21.0
            Reporter: Abdul Qadeer
            Assignee: Abdul Qadeer
             Fix For: 0.21.0


HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing 
support to handle BZip2 compressed data such that the input compressed file is 
split at arbitrary points.  This JIRA uses that functionality in 
LineRecordReader.  The benefit of this work is that, if user provides 
compressed BZip2 Text data, it will be split by Hadoop and hence will be 
processed by multiple mappers.  So BZip2 compressed data will be able to fully 
utilize the cluster power.  Currently BZip2 compressed Text file goes to one 
mapper and is not split.  So the enhancement in this JIRA provides splitting 
support  and a considerable performance gains.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to