[
https://issues.apache.org/jira/browse/HADOOP-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677563#action_12677563
]
Zheng Shao commented on HADOOP-5326:
------------------------------------
Nigel, sorry I should have included the hadoop unit test results in my
comments. We did test the patch a lot internally here. Also I ran the hadoop
unit test myself.
The reason for not adding a separate test is just like Rodrigo has said. The
data that corrupts the current implementation is about 1MB in size but we
cannot disclose it. There is another public data set that breaks the old code,
but it is about 20MB in size. I don't think we want to include that big amount
of data into the hadoop codebase.
Also as you can see, the patch is just 2 bytes inside the BZip2 algorithm
itself, literally.
I will definitely be more carefully next time.
> bzip2 codec (CBZip2OutputStream) creates corrupted output file for some inputs
> ------------------------------------------------------------------------------
>
> Key: HADOOP-5326
> URL: https://issues.apache.org/jira/browse/HADOOP-5326
> Project: Hadoop Core
> Issue Type: Bug
> Components: io
> Affects Versions: 0.19.0, 0.19.1, 0.19.2, 0.20.0, 0.21.0
> Reporter: Rodrigo Schmidt
> Assignee: Rodrigo Schmidt
> Fix For: 0.19.2, 0.20.0, 0.21.0
>
> Attachments: HADOOP-5326.2.patch, HADOOP-5326.patch
>
>
> Bzip2 codec generated corrupted output files in some test executions I
> performed. This bug is probably related to
> https://issues.apache.org/bugzilla/show_bug.cgi?id=41596.
> * In my case, the problem seems to be at the BWT (Burrows-Wheeler Transform)
> implementation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.