[
https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068170#comment-17068170
]
Hadoop QA commented on HDFS-15240:
----------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m
0s{color} | {color:red} The patch doesn't appear to include any new or modified
tests. Please justify why no new tests are needed for this patch. Also please
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
21m 6s{color} | {color:green} branch has no errors when building and testing
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m
42s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m
20s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 1m
42s{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 42s{color}
| {color:red} root in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}
0m 47s{color} | {color:orange} The patch fails to run checkstyle in root
{color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m
21s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m
0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 3m
0s{color} | {color:red} patch has errors when building and testing our client
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m
19s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m
16s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m
49s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 19s{color}
| {color:red} hadoop-hdfs-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 91m 16s{color}
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
35s{color} | {color:green} The patch does not generate ASF License warnings.
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}188m 1s{color} |
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:4454c6d14b7 |
| JIRA Issue | HDFS-15240 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12997870/HDFS-15240.001.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall
mvnsite unit shadedclient findbugs checkstyle |
| uname | Linux 4d271edefa4e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / eaaaba1 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_242 |
| findbugs | v3.1.0-RC1 |
| mvninstall |
https://builds.apache.org/job/PreCommit-HDFS-Build/29029/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs-client.txt
|
| compile |
https://builds.apache.org/job/PreCommit-HDFS-Build/29029/artifact/out/patch-compile-root.txt
|
| javac |
https://builds.apache.org/job/PreCommit-HDFS-Build/29029/artifact/out/patch-compile-root.txt
|
| checkstyle |
https://builds.apache.org/job/PreCommit-HDFS-Build/29029/artifact/out//testptch/patchprocess/maven-patch-checkstyle-root.txt
|
| mvnsite |
https://builds.apache.org/job/PreCommit-HDFS-Build/29029/artifact/out/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs-client.txt
|
| findbugs |
https://builds.apache.org/job/PreCommit-HDFS-Build/29029/artifact/out/patch-findbugs-hadoop-hdfs-project_hadoop-hdfs-client.txt
|
| javadoc |
https://builds.apache.org/job/PreCommit-HDFS-Build/29029/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-client.txt
|
| unit |
https://builds.apache.org/job/PreCommit-HDFS-Build/29029/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-client.txt
|
| unit |
https://builds.apache.org/job/PreCommit-HDFS-Build/29029/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
|
| Test Results |
https://builds.apache.org/job/PreCommit-HDFS-Build/29029/testReport/ |
| Max. process+thread count | 4022 (vs. ulimit of 5500) |
| modules | C: hadoop-common-project/hadoop-common
hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: . |
| Console output |
https://builds.apache.org/job/PreCommit-HDFS-Build/29029/console |
| Powered by | Apache Yetus 0.8.0 http://yetus.apache.org |
This message was automatically generated.
> Erasure Coding: dirty buffer causes reconstruction block error
> --------------------------------------------------------------
>
> Key: HDFS-15240
> URL: https://issues.apache.org/jira/browse/HDFS-15240
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode, erasure-coding
> Reporter: HuangTao
> Assignee: HuangTao
> Priority: Major
> Attachments: HDFS-15240.001.patch
>
>
> When read some lzo files we found some blocks were broken.
> I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from
> DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8')
> blocks. And find the longest common sequenece(LCS) between b6'(decoded) and
> b6(read from DN)(b7'/b7 and b8'/b8).
> After selecting 6 blocks of the block group in combinations one time and
> iterating through all cases, I find one case that the length of LCS is the
> block length - 64KB, 64KB is just the length of ByteBuffer used by
> StripedBlockReader. So the corrupt reconstruction block is made by a dirty
> buffer.
> The following log snippet(only show 2 of 28 cases) is my check program
> output. In my case, I known the 3th block is corrupt, so need other 5 blocks
> to decode another 3 blocks, then find the 1th block's LCS substring is block
> length - 64kb.
> It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the
> dirty buffer was used before read the 1th block.
> Must be noted that StripedBlockReader read from the offset 0 of the 1th block
> after used the dirty buffer.
> {code:java}
> decode from [0, 2, 3, 4, 5, 7] -> [1, 6, 8]
> Check Block(1) first 131072 bytes longest common substring length 4
> Check Block(6) first 131072 bytes longest common substring length 4
> Check Block(8) first 131072 bytes longest common substring length 4
> decode from [0, 2, 3, 4, 5, 6] -> [1, 7, 8]
> Check Block(1) first 131072 bytes longest common substring length 65536
> CHECK AGAIN: Block(1) all 27262976 bytes longest common substring length
> 27197440 # this one
> Check Block(7) first 131072 bytes longest common substring length 4
> Check Block(8) first 131072 bytes longest common substring length 4{code}
> Now I know the dirty buffer causes reconstruction block error, but how does
> the dirty buffer come about?
> After digging into the code and DN log, I found this following DN log is the
> root reason.
> {code:java}
> [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel
> java.nio.channels.SocketChannel[connected local=/xxxxxxxx:52586
> remote=/xxxxxxxx:50010]. 180000 millis timeout left.
> [WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped
> block: BP-714356632-xxxxxxxx-1519726836856:blk_-YYYYYYYYYYYYYY_3472979393
> java.lang.NullPointerException
> at
> org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314)
> at
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308)
> at
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269)
> at
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:94)
> at
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60)
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834) {code}
> Reading from DN may timeout(hold by a future(F)) and output the INFO log, but
> the futures that contains the future(F) is cleared,
> {code:java}
> return new StripingChunkReadResult(futures.remove(future),
> StripingChunkReadResult.CANCELLED); {code}
> futures.remove(future) cause NPE. So the EC reconstruction is failed. In the
> finally phase, the code snippet in *getStripedReader().close()*
> {code:java}
> reconstructor.freeBuffer(reader.getReadBuffer());
> reader.freeReadBuffer();
> reader.closeBlockReader(); {code}
> free buffer firstly, but the StripedBlockReader still holds the buffer and
> write it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]