[jira] [Work logged] (HDFS-16146) All three replicas are lost due to not adding a new DataNode in time

ASF GitHub Bot (Jira) Tue, 03 Aug 2021 09:23:04 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-16146?focusedWorklogId=633028&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633028
 ]


ASF GitHub Bot logged work on HDFS-16146:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Aug/21 16:22
            Start Date: 03/Aug/21 16:22
    Worklog Time Spent: 10m 
      Work Description: Hexiaoqiao merged pull request #3247:
URL: https://github.com/apache/hadoop/pull/3247


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 633028)
    Time Spent: 1.5h  (was: 1h 20m)

> All three replicas are lost due to not adding a new DataNode in time
> --------------------------------------------------------------------
>
>                 Key: HDFS-16146
>                 URL: https://issues.apache.org/jira/browse/HDFS-16146
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, hdfs
>            Reporter: Shuyan Zhang
>            Assignee: Shuyan Zhang
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We have a three-replica file, and all replicas of a block are lost when the 
> default datanode replacement strategy is used. It happened like this:
> 1. addBlock() applies for a new block and successfully connects three 
> datanodes (dn1, dn2 and dn3) to build a pipeline;
> 2. Write data;
> 3. dn1 has an error and was kicked out. At this time, the remaining datanodes 
> in the pipeline > 1, according to the replacement strategy, there is no need 
> to add a new datanode;
> 4. After writing is completed, enter PIPELINE_CLOSE;
> 5. dn2 has an error and was kicked out. But because it is already in the 
> close phase, addDatanode2ExistingPipeline() decides to hand over the task of 
> transfering the replica to the NameNode. At this time, there is only one 
> datanode left in the pipeline;
> 6. dn3 error, all replicas are lost.
> If we add a new datanode in step 5, we can avoid losing all replicas in this 
> case. I think error in PIPELINE_CLOSE and error in DATA_STREAMING have the 
> same risk of losing replicas,  we should not skip adding a new datanode 
> during PIPELINE_CLOSE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HDFS-16146) All three replicas are lost due to not adding a new DataNode in time

Reply via email to