[ 
https://issues.apache.org/jira/browse/MAPREDUCE-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871853#action_12871853
 ] 

David Ciemiewicz commented on MAPREDUCE-477:
--------------------------------------------

This does not appear to be solved in the version of hadoop that I am using: 
Hadoop 0.20.10.0.1004192217

I cannot speak as to whether or not this is fixed in the trunk.

I created two files file1.bz2 and file2.bz2 and concatenated them into 
file12.bz2

-bash-3.1$ bzcat file12.bz2
contents of file1.bz2
contents of file2.bz2

I then run a simple pig script to dump the contents of this file:

-bash-3.1$ cat concat.pig
A = load 'file12.bz2' using PigStorage();
dump A;


The output below shows that only the first file in the concatenation is read. 
The subsequent file is not read.

-bash-3.1$ pig -Dmapred.job.queue.name=... concat.pig
USING: /grid/0/gs/pig/current
2010-05-26 17:54:06,501 [main] INFO  org.apache.pig.Main - Logging error 
messages to: /homes/ciemo/.../pig_1274896446499.log
2010-05-26 17:54:06,750 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
hadoop file system at: hdfs://...:8020
2010-05-26 17:54:07,001 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
map-reduce job tracker at: ...:50300
2010-05-26 17:54:07,830 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
 - MR plan size before optimization: 1
2010-05-26 17:54:07,830 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
 - MR plan size after optimization: 1
2010-05-26 17:54:08,804 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler 
- Setting up single store job
2010-05-26 17:54:08,835 [Thread-9] WARN  org.apache.hadoop.mapred.JobClient - 
Use GenericOptionsParser for parsing the arguments. Applications should 
implement Tool for the same.
2010-05-26 17:54:09,834 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Cannot get jobid for this job
2010-05-26 17:54:32,745 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 0% complete
2010-05-26 17:55:09,412 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 100% complete
2010-05-26 17:55:09,412 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Successfully stored result in: "hdfs://...
2010-05-26 17:55:11,158 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Records written : 1
2010-05-26 17:55:11,159 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Bytes written : 34
2010-05-26 17:55:11,159 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Success!
(contents of file1.bz2)

The dump should have shown both file1.bz2 and file2.bz2

> Support for reading bzip2 compressed file created using concatenation of 
> multiple .bz2 files 
> ---------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-477
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-477
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Suhas Gogate
>            Priority: Minor
>
> Bzip2Codec supported in Hadoop 0.19/0.20  should support for reading bzip2 
> compressed file created using concatenation of multiple .bz2 files 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to