[ https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Douglas updated MAPREDUCE-830: ------------------------------------ Status: Patch Available (was: Open) > Providing BZip2 splitting support for Text data > ----------------------------------------------- > > Key: MAPREDUCE-830 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-830 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Affects Versions: 0.21.0 > Reporter: Abdul Qadeer > Assignee: Abdul Qadeer > Fix For: 0.21.0 > > Attachments: M830-2.patch, M830-3.patch, M830-4.patch, > MapReduce-830-version1.patch > > > HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing > support to handle BZip2 compressed data such that the input compressed file > is split at arbitrary points. This JIRA uses that functionality in > LineRecordReader. The benefit of this work is that, if user provides > compressed BZip2 Text data, it will be split by Hadoop and hence will be > processed by multiple mappers. So BZip2 compressed data will be able to > fully utilize the cluster power. Currently BZip2 compressed Text file goes > to one mapper and is not split. So the enhancement in this JIRA provides > splitting support and a considerable performance gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.