[ 
https://issues.apache.org/jira/browse/HADOOP-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527868#comment-13527868
 ] 

Yu Li commented on HADOOP-7386:
-------------------------------

I have tested concatenated bzip2 files with hadoop-1.0.3 plus patch of 
HADOOP-7823, and confirmed it could be read-out correctly in MR job. Below are 
the detailed steps of my testing:

1) create file test1, with content:
   =================================
   Hello World
   World test
   =================================
2) create file test2, with content:
   =================================
   Hello Jay
   Jay test
   =================================
3) compress them using command "bzip2 -z test1 test2", and this would create 
test1.bz2 and test2.bz2
4) create the concatenated bzip2 file with command "cat test1.bz2 test2.bz2 > 
test-contatenate.bz2"
5) create dir and put the concatenated bzip2 file in HDFS: "hadoop fs -mkdir 
/tmp/bzip2/input && hadoop fs -put test-contatenate.bz2 /tmp/bzip2/input"
6) run wordcount example program to test: "hadoop jar 
$HADOOP_HOME/hadoop-examples*.jar wordcount /tmp/bzip2/input /tmp/bzip2/output"
7) check the result, it's correct with content:
   =================================
   Hello   2
   Jay     2
   World   2
   test    2
   =================================
                
> Support concatenated bzip2 files
> --------------------------------
>
>                 Key: HADOOP-7386
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7386
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Allen Wittenauer
>            Assignee: Karthik Kambatla
>
> HADOOP-6835 added the framework and direct support for concatenated gzip 
> files.  We should do the same for bzip files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to