[
https://issues.apache.org/jira/browse/HDFS-11472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034052#comment-16034052
]
Hadoop QA commented on HDFS-11472:
----------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m
31s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m
0s{color} | {color:green} The patch appears to include 2 new or modified test
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}
0m 34s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch
generated 2 new + 120 unchanged - 1 fixed = 122 total (was 121) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m
0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
37s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 44s{color}
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m
28s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}101m 3s{color} |
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests |
hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
| | hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy |
| | hadoop.hdfs.TestErasureCodeBenchmarkThroughput |
| | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 |
| | hadoop.hdfs.server.balancer.TestBalancer |
| | hadoop.hdfs.server.datanode.TestDirectoryScanner |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-11472 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12870896/HDFS-11472.002.patch |
| Optional Tests | asflicense compile javac javadoc mvninstall mvnsite
unit findbugs checkstyle |
| uname | Linux 8fd774ea4781 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh
|
| git revision | trunk / 7101477 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| checkstyle |
https://builds.apache.org/job/PreCommit-HDFS-Build/19735/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
|
| unit |
https://builds.apache.org/job/PreCommit-HDFS-Build/19735/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
|
| Test Results |
https://builds.apache.org/job/PreCommit-HDFS-Build/19735/testReport/ |
| asflicense |
https://builds.apache.org/job/PreCommit-HDFS-Build/19735/artifact/patchprocess/patch-asflicense-problems.txt
|
| modules | C: hadoop-hdfs-project/hadoop-hdfs U:
hadoop-hdfs-project/hadoop-hdfs |
| Console output |
https://builds.apache.org/job/PreCommit-HDFS-Build/19735/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org |
This message was automatically generated.
> Fix inconsistent replica size after a data pipeline failure
> -----------------------------------------------------------
>
> Key: HDFS-11472
> URL: https://issues.apache.org/jira/browse/HDFS-11472
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Reporter: Wei-Chiu Chuang
> Assignee: Wei-Chiu Chuang
> Priority: Critical
> Labels: release-blocker
> Attachments: HDFS-11472.001.patch, HDFS-11472.002.patch,
> HDFS-11472.testcase.patch
>
>
> We observed a case where a replica's on disk length is less than acknowledged
> length, breaking the assumption in recovery code.
> {noformat}
> 2017-01-08 01:41:03,532 WARN
> org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to
> obtain replica info for block
> (=BP-947993742-10.204.0.136-1362248978912:blk_2526438952_1101394519586) from
> datanode (=DatanodeInfoWithStorage[10.204.138.17:1004,null,null])
> java.io.IOException: THIS IS NOT SUPPOSED TO HAPPEN: getBytesOnDisk() <
> getVisibleLength(), rip=ReplicaBeingWritten, blk_2526438952_1101394519586, RBW
> getNumBytes() = 27530
> getBytesOnDisk() = 27006
> getVisibleLength()= 27268
> getVolume() = /data/6/hdfs/datanode/current
> getBlockFile() =
> /data/6/hdfs/datanode/current/BP-947993742-10.204.0.136-1362248978912/current/rbw/blk_2526438952
> bytesAcked=27268
> bytesOnDisk=27006
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.initReplicaRecovery(FsDatasetImpl.java:2284)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.initReplicaRecovery(FsDatasetImpl.java:2260)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.initReplicaRecovery(DataNode.java:2566)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.callInitReplicaRecovery(DataNode.java:2577)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2645)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.access$400(DataNode.java:245)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2551)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> It turns out that if an exception is thrown within
> {{BlockReceiver#receivePacket}}, the in-memory replica on disk length may not
> be updated, but the data is written to disk anyway.
> For example, here's one exception we observed
> {noformat}
> 2017-01-08 01:40:59,512 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> Exception for
> BP-947993742-10.204.0.136-1362248978912:blk_2526438952_1101394499067
> java.nio.channels.ClosedByInterruptException
> at
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
> at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:269)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.adjustCrcChannelPosition(FsDatasetImpl.java:1484)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.adjustCrcFilePosition(BlockReceiver.java:994)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:670)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:857)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:797)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> There are potentially other places and causes where an exception is thrown
> within {{BlockReceiver#receivePacket}}, so it may not make much sense to
> alleviate it for this particular exception. Instead, we should improve
> replica recovery code to handle the case where ondisk size is less than
> acknowledged size, and update in-memory checksum accordingly.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]