[jira] Commented: (MAPREDUCE-477) Support for reading bzip2 compressed file created using concatenation of multiple .bz2 files

Yuri Pradkin (JIRA) Wed, 26 May 2010 15:42:09 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871993#action_12871993
 ]


Yuri Pradkin commented on MAPREDUCE-477:
----------------------------------------

Just tried this on our cluster:
    echo "content1" | bzip2 - >foo.bz2
    echo "content2" | bzip2 - >>foo.bz2
     bzcat foo.bz2
    {quote}
    content1
    content2
    {quote}
    hdfs -put foo.bz2 foo.bz2
    hadoop jar .../hadoop-streaming.jar -input foo.bz2 -output foo -mapper 
/bin/cat  -reducer /bin/cat

This completes after scheduling some rediculous number of splits (98)

    hdfs -getmerge foo foo
    cat foo
    {quote}
    content1
    content2
    {quote}

mapreduce/common: trunk rev 897063


> Support for reading bzip2 compressed file created using concatenation of 
> multiple .bz2 files 
> ---------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-477
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-477
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Suhas Gogate
>            Priority: Minor
>
> Bzip2Codec supported in Hadoop 0.19/0.20  should support for reading bzip2 
> compressed file created using concatenation of multiple .bz2 files 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-477) Support for reading bzip2 compressed file created using concatenation of multiple .bz2 files

Reply via email to