[ 
https://issues.apache.org/jira/browse/HDFS-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14200625#comment-14200625
 ] 

Zhe Zhang commented on HDFS-7374:
---------------------------------

This use case has a (slightly) conflicting objective as that of HDFS-6791. I 
can think of 2 options to accommodate both scenarios:

# Set a timeout (e.g., 10 minutes) limiting the time that a dead DN can stay in 
DECOMMISSION_INPROGRESS state.
# If a DN is already dead when decomm starts, indicating that a user is 
intentionally decommissioning a dead node, we should allow it to enter 
decommission complete state.

[~mingma] and [~jingzhao] please advise if they look reasonable to you, and 
whether you prefer one over the other. Or any other approaches. Thanks!

> Allow decommissioning of dead DataNodes
> ---------------------------------------
>
>                 Key: HDFS-7374
>                 URL: https://issues.apache.org/jira/browse/HDFS-7374
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>
> We have seen the use case of decommissioning DataNodes that are already dead 
> or unresponsive, and not expected to rejoin the cluster.
> The logic introduced by HDFS-6791 will mark those nodes as 
> {{DECOMMISSION_INPROGRESS}}, with a hope that they can come back and finish 
> the decommission work. If an upper layer application is monitoring the 
> decommissioning progress, it will hang forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to