[ 
https://issues.apache.org/jira/browse/PIG-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164180#comment-13164180
 ] 

xuting zhao commented on PIG-2391:
----------------------------------

Hi Daniel, I think that is because the patch in PIG-2143 only applied on 0.10 
and later branches.
In 0.9, instead of calling the codecFactory.getCodec function which can cause 
this problem, it directly checks the suffix of the path to determine what kind 
of compressionCodec class should be applied.

To be specific, In 0.9, the code is like the following:

      if (location.endsWith(".bz2") || location.endsWith(".bz")) {
                FileOutputFormat.setCompressOutput(job, true);
                FileOutputFormat.setOutputCompressorClass(job,  
BZip2Codec.class);
            }  else if (location.endsWith(".gz")) {
                FileOutputFormat.setCompressOutput(job, true);
                FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
            } else {
                FileOutputFormat.setCompressOutput( job, false);
            }

In .10, the code is like:
       CompressionCodec codec = codecFactory.getCodec(location);
        if (codec != null) {
            FileOutputFormat.setCompressOutput(job, true);
            FileOutputFormat.setOutputCompressorClass(job, codec.getClass());
        }else {
            FileOutputFormat.setCompressOutput(job, false);  
        }
                
> Bzip_2 test is broken
> ---------------------
>
>                 Key: PIG-2391
>                 URL: https://issues.apache.org/jira/browse/PIG-2391
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10
>            Reporter: Olga Natkovich
>            Assignee: xuting zhao
>             Fix For: 0.10, 0.11
>
>         Attachments: PIG-2391-1.patch, PIG-2391.patch
>
>
> This test is currently commented out but if you uncomment it it fails with 
> Pig 10 but runs successfully with Pig 9.
> Script:
> a = load '/homes/olgan/studenttab10k' using PigStorage() as (name, age, gpa);
> store a into 'intermediate.bz';
> b = load 'intermediate.bz';
> store b into 'final.bz';
> A couple of observations:
> (1) Identical script (represented by Bzip_1 test) that has bz2 instead of bz 
> extension in the script succeeds in Pig 10
> (2) The problem occurs while reading intermediate.bz which has different size 
> with Pig 9 and Pig 10
> (3) Problem can be reproduced in local mode with small subset of data in the 
> file
> (4) The following stack trace is observed:
> 2011-12-01 13:53:12,280 [Thread-22] WARN  
> org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
> java.lang.RuntimeException: java.io.IOException: compressedStream EOF
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:237)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.<init>(PigRecordReader.java:109)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:119)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> Caused by: java.io.IOException: compressedStream EOF
>         at 
> org.apache.tools.bzip2r.CBZip2InputStream.cadvise(CBZip2InputStream.java:92)
>         at 
> org.apache.tools.bzip2r.CBZip2InputStream.compressedStreamEOF(CBZip2InputStream.java:96)
>         at 
> org.apache.tools.bzip2r.CBZip2InputStream.bsR(CBZip2InputStream.java:451)
>         at 
> org.apache.tools.bzip2r.CBZip2InputStream.initBlock(CBZip2InputStream.java:348)
>         at 
> org.apache.tools.bzip2r.CBZip2InputStream.<init>(CBZip2InputStream.java:220)
>         at 
> org.apache.pig.bzip2r.Bzip2TextInputFormat$BZip2LineRecordReader.<init>(Bzip2TextInputFormat.java:105)
>         at 
> org.apache.pig.bzip2r.Bzip2TextInputFormat.createRecordReader(Bzip2TextInputFormat.java:244)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:227)
>         ... 5 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to