[ 
https://issues.apache.org/jira/browse/HADOOP-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677563#action_12677563
 ] 

Zheng Shao commented on HADOOP-5326:
------------------------------------

Nigel, sorry I should have included the hadoop unit test results in my 
comments. We did test the patch a lot internally here.  Also I ran the hadoop 
unit test myself.

The reason for not adding a separate test is just like Rodrigo has said. The 
data that corrupts the current implementation is about 1MB in size but we 
cannot disclose it. There is another public data set that breaks the old code, 
but it is about 20MB in size. I don't think we want to include that big amount 
of data into the hadoop codebase.

Also as you can see, the patch is just 2 bytes inside the BZip2 algorithm 
itself, literally.

I will definitely be more carefully next time.


> bzip2 codec (CBZip2OutputStream) creates corrupted output file for some inputs
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-5326
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5326
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.19.0, 0.19.1, 0.19.2, 0.20.0, 0.21.0
>            Reporter: Rodrigo Schmidt
>            Assignee: Rodrigo Schmidt
>             Fix For: 0.19.2, 0.20.0, 0.21.0
>
>         Attachments: HADOOP-5326.2.patch, HADOOP-5326.patch
>
>
> Bzip2 codec generated corrupted output files in some test executions I 
> performed. This bug is probably related to 
> https://issues.apache.org/bugzilla/show_bug.cgi?id=41596.
> * In my case, the problem seems to be at the BWT (Burrows-Wheeler Transform) 
> implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to