[ https://issues.apache.org/jira/browse/HADOOP-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579545#action_12579545 ]
lohit vijayarenu commented on HADOOP-1823: ------------------------------------------ I was able to use this bzip2.jar with streaming. This would be a very useful addition. > want InputFormat for bzip2 files > -------------------------------- > > Key: HADOOP-1823 > URL: https://issues.apache.org/jira/browse/HADOOP-1823 > Project: Hadoop Core > Issue Type: New Feature > Components: mapred > Reporter: Doug Cutting > Attachments: bzip2.jar > > > Unlike gzip, the bzip file format supports splitting. Compression is by > blocks (900k by default) and blocks are separated by a synchronization marker > (a 48-bit approximation of Pi). This would permit very large compressed > files to be split into multiple map tasks, which is not currently possible > unless using a Hadoop-specific file format. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.