[
https://issues.apache.org/jira/browse/HDFS-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282880#comment-13282880
]
Konstantin Shvachko commented on HDFS-3368:
-------------------------------------------
No, all failures are unrelated to the patch.
I looked through the Jenkins logs.
# org.apache.hadoop.hdfs.TestDFSClientRetries.testGetFileChecksum
This one failes because previous test sets xceiver count in config to 2 and
never resets it back. So creation of a large file in testGetFileChecksum
eventually fails, because DNs refuse to add more xceiver threads.
{code}
java.io.IOException: Xceiver count 3 exceeds the limit of concurrent xcievers: 2
at
org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:143)
at java.lang.Thread.run(Thread.java:662)
{code}
#
org.apache.hadoop.hdfs.TestDatanodeBlockScanner.testBlockCorruptionRecoveryPolicy1
Failes because DFSTestUtil.waitCorruptReplicas() is timing- / delay- sensitive.
It reads some file 50 times and checks if the corruption is detected after each
read.
That time was enough for the DN to restart, but not enough for NN to detect the
corruption.
Looking for "NameSystem.addToCorruptReplicasMap:" and it is not in the logs.
By the way testBlockCorruptionRecoveryPolicy2 which corrupts 2 replicas onstead
of one worked fine.
#
org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks.testCorruptBlockRereplicatedAcrossRacks
failes for the same reason. I see fifty "Waiting for 1 corrupt replicas", which
means 50 read have been done, but no "addToCorruptReplicasMap" indicating that
corruption was not detected.
I can file jiras for that.
Resubmitted the build in case I missed something.
> Missing blocks due to bad DataNodes comming up and down.
> --------------------------------------------------------
>
> Key: HDFS-3368
> URL: https://issues.apache.org/jira/browse/HDFS-3368
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.22.0, 1.0.0, 2.0.0-alpha, 3.0.0
> Reporter: Konstantin Shvachko
> Assignee: Konstantin Shvachko
> Attachments: blockDeletePolicy-0.22.patch,
> blockDeletePolicy-0.22.patch, blockDeletePolicy-trunk.patch,
> blockDeletePolicy-trunk.patch, blockDeletePolicy.patch
>
>
> All replicas of a block can be removed if bad DataNodes come up and down
> during cluster restart resulting in data loss.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira