[
https://issues.apache.org/jira/browse/HDFS-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240097#comment-15240097
]
Mingliang Liu commented on HDFS-10283:
--------------------------------------
It happens in our internal daily UT Jenkins, and recent Apache trunk pre-commit
builds (e.g.
https://builds.apache.org/job/PreCommit-HDFS-Build/15140/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt)
Before this exception, the NN has complained about not enough replicas as
following:
{code}
2016-04-12 13:21:30,511 WARN blockmanagement.BlockPlacementPolicy
(BlockPlacementPolicyDefault.java:chooseTarget(380)) - Failed to place enough
replicas, still in need of 1 to reach 3 (unavailableStorages=[],
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK],
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more
information, please enable DEBUG log level on
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2016-04-12 13:21:30,511 WARN blockmanagement.BlockPlacementPolicy
(BlockPlacementPolicyDefault.java:chooseTarget(380)) - Failed to place enough
replicas, still in need of 1 to reach 3 (unavailableStorages=[DISK],
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK],
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more
information, please enable DEBUG log level on
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2016-04-12 13:21:30,511 WARN protocol.BlockStoragePolicy
(BlockStoragePolicy.java:chooseStorageTypes(162)) - Failed to place enough
replicas: expected size is 1 but only 0 storage types can be selected
(replication=3, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK],
policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[],
replicationFallbacks=[ARCHIVE]})
2016-04-12 13:21:30,512 WARN blockmanagement.BlockPlacementPolicy
(BlockPlacementPolicyDefault.java:chooseTarget(380)) - Failed to place enough
replicas, still in need of 1 to reach 3 (unavailableStorages=[DISK, ARCHIVE],
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK],
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) All
required storage types are unavailable: unavailableStorages=[DISK, ARCHIVE],
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK],
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
{code}
The basic problem here is that the number of datanodes equals to the
replication numbers (which is 3 in the test). If there is failure in the
writing (appending) pipeline, there is no more good datanodes to replace with.
The block manager will complain the above error but will not fail the request.
That's why we did not see exceptions thrown by NN. The client side will check
that there is no newly allocated DN for replacing nodes in the pipeline, and
the DataStreamer will throw the exception as java.io.IOException: Failed to
replace a bad datanode on the existing pipeline due to no more good datanodes
being available to try.
> o.a.h.hdfs.server.namenode.TestFSImageWithSnapshot#testSaveLoadImageWithAppending
> fails intermittently
> ------------------------------------------------------------------------------------------------------
>
> Key: HDFS-10283
> URL: https://issues.apache.org/jira/browse/HDFS-10283
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: test
> Affects Versions: 2.8.0
> Reporter: Mingliang Liu
> Assignee: Mingliang Liu
>
> The test fails with exception as following:
> {code}
> java.io.IOException: Failed to replace a bad datanode on the existing
> pipeline due to no more good datanodes being available to try. (Nodes:
> current=[DatanodeInfoWithStorage[127.0.0.1:47227,DS-dd109c14-79e5-4380-ac5e-4434cd7e25b5,DISK],
>
> DatanodeInfoWithStorage[127.0.0.1:56949,DS-6c0be75e-a78c-41b9-bfd0-7ee0cdefaa0e,DISK]],
>
> original=[DatanodeInfoWithStorage[127.0.0.1:47227,DS-dd109c14-79e5-4380-ac5e-4434cd7e25b5,DISK],
>
> DatanodeInfoWithStorage[127.0.0.1:56949,DS-6c0be75e-a78c-41b9-bfd0-7ee0cdefaa0e,DISK]]).
> The current failed datanode replacement policy is DEFAULT, and a client may
> configure this via
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its
> configuration.
> at
> org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1162)
> at
> org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1232)
> at
> org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1423)
> at
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1338)
> at
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1321)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:599)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)