[
https://issues.apache.org/jira/browse/MAPREDUCE-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871853#action_12871853
]
David Ciemiewicz commented on MAPREDUCE-477:
--------------------------------------------
This does not appear to be solved in the version of hadoop that I am using:
Hadoop 0.20.10.0.1004192217
I cannot speak as to whether or not this is fixed in the trunk.
I created two files file1.bz2 and file2.bz2 and concatenated them into
file12.bz2
-bash-3.1$ bzcat file12.bz2
contents of file1.bz2
contents of file2.bz2
I then run a simple pig script to dump the contents of this file:
-bash-3.1$ cat concat.pig
A = load 'file12.bz2' using PigStorage();
dump A;
The output below shows that only the first file in the concatenation is read.
The subsequent file is not read.
-bash-3.1$ pig -Dmapred.job.queue.name=... concat.pig
USING: /grid/0/gs/pig/current
2010-05-26 17:54:06,501 [main] INFO org.apache.pig.Main - Logging error
messages to: /homes/ciemo/.../pig_1274896446499.log
2010-05-26 17:54:06,750 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to
hadoop file system at: hdfs://...:8020
2010-05-26 17:54:07,001 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to
map-reduce job tracker at: ...:50300
2010-05-26 17:54:07,830 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2010-05-26 17:54:07,830 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2010-05-26 17:54:08,804 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2010-05-26 17:54:08,835 [Thread-9] WARN org.apache.hadoop.mapred.JobClient -
Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
2010-05-26 17:54:09,834 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Cannot get jobid for this job
2010-05-26 17:54:32,745 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2010-05-26 17:55:09,412 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2010-05-26 17:55:09,412 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Successfully stored result in: "hdfs://...
2010-05-26 17:55:11,158 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Records written : 1
2010-05-26 17:55:11,159 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Bytes written : 34
2010-05-26 17:55:11,159 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!
(contents of file1.bz2)
The dump should have shown both file1.bz2 and file2.bz2
> Support for reading bzip2 compressed file created using concatenation of
> multiple .bz2 files
> ---------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-477
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-477
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Suhas Gogate
> Priority: Minor
>
> Bzip2Codec supported in Hadoop 0.19/0.20 should support for reading bzip2
> compressed file created using concatenation of multiple .bz2 files
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.