[ 
https://issues.apache.org/jira/browse/PIG-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuting zhao updated PIG-2391:
-----------------------------

    Fix Version/s: 0.11
           Status: Patch Available  (was: Open)

This problem is caused by PIG-2143 where the setCompression function is added 
and codecFactory.getCodec(path) is called to determine whether there is a 
corresponding CompressionCodec class for the compression file. 
codecFactory.getCodec() will return null for .bz file while return BZip2Codec 
for .bz2 file. In 0.9, the suffix of the file path is used to determine this. 

In this patch, I modified the setCompression function to treat .bz file as .bz2 
file and added new unit test for this case. ant test-commit has been run 
successfully on .10 branch and trunk.
                
> Bzip_2 test is broken
> ---------------------
>
>                 Key: PIG-2391
>                 URL: https://issues.apache.org/jira/browse/PIG-2391
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10
>            Reporter: Olga Natkovich
>            Assignee: xuting zhao
>             Fix For: 0.10, 0.11
>
>         Attachments: PIG-2391.patch
>
>
> This test is currently commented out but if you uncomment it it fails with 
> Pig 10 but runs successfully with Pig 9.
> Script:
> a = load '/homes/olgan/studenttab10k' using PigStorage() as (name, age, gpa);
> store a into 'intermediate.bz';
> b = load 'intermediate.bz';
> store b into 'final.bz';
> A couple of observations:
> (1) Identical script (represented by Bzip_1 test) that has bz2 instead of bz 
> extension in the script succeeds in Pig 10
> (2) The problem occurs while reading intermediate.bz which has different size 
> with Pig 9 and Pig 10
> (3) Problem can be reproduced in local mode with small subset of data in the 
> file
> (4) The following stack trace is observed:
> 2011-12-01 13:53:12,280 [Thread-22] WARN  
> org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
> java.lang.RuntimeException: java.io.IOException: compressedStream EOF
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:237)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.<init>(PigRecordReader.java:109)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:119)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> Caused by: java.io.IOException: compressedStream EOF
>         at 
> org.apache.tools.bzip2r.CBZip2InputStream.cadvise(CBZip2InputStream.java:92)
>         at 
> org.apache.tools.bzip2r.CBZip2InputStream.compressedStreamEOF(CBZip2InputStream.java:96)
>         at 
> org.apache.tools.bzip2r.CBZip2InputStream.bsR(CBZip2InputStream.java:451)
>         at 
> org.apache.tools.bzip2r.CBZip2InputStream.initBlock(CBZip2InputStream.java:348)
>         at 
> org.apache.tools.bzip2r.CBZip2InputStream.<init>(CBZip2InputStream.java:220)
>         at 
> org.apache.pig.bzip2r.Bzip2TextInputFormat$BZip2LineRecordReader.<init>(Bzip2TextInputFormat.java:105)
>         at 
> org.apache.pig.bzip2r.Bzip2TextInputFormat.createRecordReader(Bzip2TextInputFormat.java:244)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:227)
>         ... 5 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to