[jira] [Commented] (HDFS-10283) o.a.h.hdfs.server.namenode.TestFSImageWithSnapshot#testSaveLoadImageWithAppending fails intermittently

Mingliang Liu (JIRA) Wed, 13 Apr 2016 14:56:10 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240097#comment-15240097
 ]


Mingliang Liu commented on HDFS-10283:
--------------------------------------

It happens in our internal daily UT Jenkins, and recent Apache trunk pre-commit 
builds (e.g. 
https://builds.apache.org/job/PreCommit-HDFS-Build/15140/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt)

Before this exception, the NN has complained about not enough replicas as 
following:
{code}
2016-04-12 13:21:30,511 WARN  blockmanagement.BlockPlacementPolicy 
(BlockPlacementPolicyDefault.java:chooseTarget(380)) - Failed to place enough 
replicas, still in need of 1 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2016-04-12 13:21:30,511 WARN  blockmanagement.BlockPlacementPolicy 
(BlockPlacementPolicyDefault.java:chooseTarget(380)) - Failed to place enough 
replicas, still in need of 1 to reach 3 (unavailableStorages=[DISK], 
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2016-04-12 13:21:30,511 WARN  protocol.BlockStoragePolicy 
(BlockStoragePolicy.java:chooseStorageTypes(162)) - Failed to place enough 
replicas: expected size is 1 but only 0 storage types can be selected 
(replication=3, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK], 
policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], 
replicationFallbacks=[ARCHIVE]})
2016-04-12 13:21:30,512 WARN  blockmanagement.BlockPlacementPolicy 
(BlockPlacementPolicyDefault.java:chooseTarget(380)) - Failed to place enough 
replicas, still in need of 1 to reach 3 (unavailableStorages=[DISK, ARCHIVE], 
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) All 
required storage types are unavailable:  unavailableStorages=[DISK, ARCHIVE], 
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
{code}

The basic problem here is that the number of datanodes equals to the 
replication numbers (which is 3 in the test). If there is failure in the 
writing (appending) pipeline, there is no more good datanodes to replace with. 
The block manager will complain the above error but will not fail the request. 
That's why we did not see exceptions thrown by NN. The client side will check 
that there is no newly allocated DN for replacing nodes in the pipeline, and 
the DataStreamer will throw the exception as java.io.IOException: Failed to 
replace a bad datanode on the existing pipeline due to no more good datanodes 
being available to try.

> o.a.h.hdfs.server.namenode.TestFSImageWithSnapshot#testSaveLoadImageWithAppending
>  fails intermittently
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10283
>                 URL: https://issues.apache.org/jira/browse/HDFS-10283
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.8.0
>            Reporter: Mingliang Liu
>            Assignee: Mingliang Liu
>
> The test fails with exception as following: 
> {code}
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:47227,DS-dd109c14-79e5-4380-ac5e-4434cd7e25b5,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:56949,DS-6c0be75e-a78c-41b9-bfd0-7ee0cdefaa0e,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:47227,DS-dd109c14-79e5-4380-ac5e-4434cd7e25b5,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:56949,DS-6c0be75e-a78c-41b9-bfd0-7ee0cdefaa0e,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>       at 
> org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1162)
>       at 
> org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1232)
>       at 
> org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1423)
>       at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1338)
>       at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1321)
>       at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:599)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-10283) o.a.h.hdfs.server.namenode.TestFSImageWithSnapshot#testSaveLoadImageWithAppending fails intermittently

Reply via email to