[jira] [Commented] (HDFS-3368) Missing blocks due to bad DataNodes comming up and down.

Konstantin Shvachko (JIRA) Thu, 24 May 2012 17:49:45 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282880#comment-13282880
 ]


Konstantin Shvachko commented on HDFS-3368:
-------------------------------------------

No, all failures are unrelated to the patch.
I looked through the Jenkins logs.

# org.apache.hadoop.hdfs.TestDFSClientRetries.testGetFileChecksum 
This one failes because previous test sets xceiver count in config to 2 and 
never resets it back. So creation of a large file in testGetFileChecksum 
eventually fails, because DNs refuse to add more xceiver threads.
{code}
java.io.IOException: Xceiver count 3 exceeds the limit of concurrent xcievers: 2
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:143)
        at java.lang.Thread.run(Thread.java:662)
{code}
# 
org.apache.hadoop.hdfs.TestDatanodeBlockScanner.testBlockCorruptionRecoveryPolicy1
 
Failes because DFSTestUtil.waitCorruptReplicas() is timing- / delay- sensitive.
It reads some file 50 times and checks if the corruption is detected after each 
read.
That time was enough for the DN to restart, but not enough for NN to detect the 
corruption.
Looking for "NameSystem.addToCorruptReplicasMap:" and it is not in the logs.
By the way testBlockCorruptionRecoveryPolicy2 which corrupts 2 replicas onstead 
of one worked fine.
# 
org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks.testCorruptBlockRereplicatedAcrossRacks
failes for the same reason. I see fifty "Waiting for 1 corrupt replicas", which 
means 50 read have been done, but no "addToCorruptReplicasMap" indicating that 
corruption was not detected.

I can file jiras for that.

Resubmitted the build in case I missed something.
                
> Missing blocks due to bad DataNodes comming up and down.
> --------------------------------------------------------
>
>                 Key: HDFS-3368
>                 URL: https://issues.apache.org/jira/browse/HDFS-3368
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.22.0, 1.0.0, 2.0.0-alpha, 3.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: blockDeletePolicy-0.22.patch, 
> blockDeletePolicy-0.22.patch, blockDeletePolicy-trunk.patch, 
> blockDeletePolicy-trunk.patch, blockDeletePolicy.patch
>
>
> All replicas of a block can be removed if bad DataNodes come up and down 
> during cluster restart resulting in data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3368) Missing blocks due to bad DataNodes comming up and down.

Reply via email to