[ https://issues.apache.org/jira/browse/HADOOP-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192365#comment-16192365 ]
Hadoop QA commented on HADOOP-14919: ------------------------------------ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 49s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 6s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 19s{color} | {color:orange} root: The patch generated 2 new + 118 unchanged - 9 fixed = 120 total (was 127) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 57s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 22s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}101m 37s{color} | {color:green} hadoop-mapreduce-client-jobclient in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 45s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}216m 49s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.security.TestKDiag | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:71bbb86 | | JIRA Issue | HADOOP-14919 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12890447/HADOOP-14919.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 081a67763b93 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cae1c73 | | Default Java | 1.8.0_144 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/13454/artifact/patchprocess/diff-checkstyle-root.txt | | unit | https://builds.apache.org/job/PreCommit-HADOOP-Build/13454/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/13454/testReport/ | | modules | C: hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient U: . | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/13454/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > BZip2 drops records when reading data in splits > ----------------------------------------------- > > Key: HADOOP-14919 > URL: https://issues.apache.org/jira/browse/HADOOP-14919 > Project: Hadoop Common > Issue Type: Bug > Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1 > Reporter: Aki Tanaka > Assignee: Jason Lowe > Priority: Critical > Attachments: 250000.bz2, HADOOP-14919.001.patch, > HADOOP-14919-test.patch > > > BZip2 can drop records when reading data in splits. This problem was already > discussed before in HADOOP-11445 and HADOOP-13270. But we still have a > problem in corner case, causing lost data blocks. > > I attached a unit test for this issue. You can reproduce the problem if you > run the unit test. > > First, this issue happens when position of newly created stream is equal to > start of split. Hadoop has some test cases for this (blockEndingInCR.txt.bz2 > file for TestLineRecordReader#testBzip2SplitStartAtBlockMarker, etc). > However, the issue I am reporting does not happen when we run these tests > because this issue happens only when the start of split byte block includes > both block marker and compressed data. > > BZip2 block marker - 0x314159265359 > (001100010100000101011001001001100101001101011001) > > blockEndingInCR.txt.bz2 (Start of Split - 136504): > {code:java} > $ xxd -l 6 -g 1 -b -seek 136498 > ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/target/test-classes/blockEndingInCR.txt.bz2 > 0021532: 00110001 01000001 01011001 00100110 01010011 01011001 1AY&SY > {code} > > Test bz2 File (Start of Split - 203426) > {code:java} > $ xxd -l 7 -g 1 -b -seek 203419 250000.bz2 > 0031a9b: 11100110 00101000 00101011 00100100 11001010 01101011 .(+$.k > 0031aa1: 00101111 / > {code} > > Let's say a job splits this test bz2 file into two splits at the start of > split (position 203426). > The former split does not read records which start position 203426 because > BZip2 says the position of these dropped records is 203427. The latter split > does not read the records because BZip2CompressionInputStream read the block > from position 320955. > Due to this behavior, records between 203427 and 320955 are lost. > Also, if we reverted the changes in HADOOP-13270, we will not see this issue. > We will see HADOOP-13270 issue though. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org