[
https://issues.apache.org/jira/browse/HDFS-16601?focusedWorklogId=775625&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-775625
]
ASF GitHub Bot logged work on HDFS-16601:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 28/May/22 04:32
Start Date: 28/May/22 04:32
Worklog Time Spent: 10m
Work Description: ZanderXu opened a new pull request, #4369:
URL: https://github.com/apache/hadoop/pull/4369
Detail info please refer to
[HDFS-16601](https://issues.apache.org/jira/browse/HDFS-16601).
Bug stack like:
```
java.io.IOException: Failed to replace a bad datanode on the existing
pipeline due to no more good datanodes being available to try. (Nodes:
current=[DatanodeInfoWithStorage[127.0.0.1:59687,DS-b803febc-7b22-4144-9b39-7bf521cdaa8d,DISK],
DatanodeInfoWithStorage[127.0.0.1:59670,DS-0d652bc2-1784-430d-961f-750f80a290f1,DISK]],
original=[DatanodeInfoWithStorage[127.0.0.1:59670,DS-0d652bc2-1784-430d-961f-750f80a290f1,DISK],
DatanodeInfoWithStorage[127.0.0.1:59687,DS-b803febc-7b22-4144-9b39-7bf521cdaa8d,DISK]]).
The current failed datanode replacement policy is DEFAULT, and a client may
configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy'
in its configuration.
at
org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1418)
at
org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1478)
at
org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1704)
at
org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1605)
at
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1587)
at
org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1371)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:674)
```
Issue Time Tracking
-------------------
Worklog Id: (was: 775625)
Remaining Estimate: 0h
Time Spent: 10m
> Failed to replace a bad datanode on the existing pipeline due to no more good
> datanodes being available to try
> --------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-16601
> URL: https://issues.apache.org/jira/browse/HDFS-16601
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: ZanderXu
> Assignee: ZanderXu
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> In our production environment, we found a bug and stack like:
> {code:java}
> java.io.IOException: Failed to replace a bad datanode on the existing
> pipeline due to no more good datanodes being available to try. (Nodes:
> current=[DatanodeInfoWithStorage[127.0.0.1:59687,DS-b803febc-7b22-4144-9b39-7bf521cdaa8d,DISK],
>
> DatanodeInfoWithStorage[127.0.0.1:59670,DS-0d652bc2-1784-430d-961f-750f80a290f1,DISK]],
>
> original=[DatanodeInfoWithStorage[127.0.0.1:59670,DS-0d652bc2-1784-430d-961f-750f80a290f1,DISK],
>
> DatanodeInfoWithStorage[127.0.0.1:59687,DS-b803febc-7b22-4144-9b39-7bf521cdaa8d,DISK]]).
> The current failed datanode replacement policy is DEFAULT, and a client may
> configure this via
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its
> configuration.
> at
> org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1418)
> at
> org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1478)
> at
> org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1704)
> at
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1605)
> at
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1587)
> at
> org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1371)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:674)
> {code}
> And the root cause is that DFSClient cannot perceive the exception of
> TransferBlock during PipelineRecovery. If failed during TransferBlock, the
> DFSClient will retry all datanodes in the cluster and then failed.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]