[jira] [Work logged] (HDFS-15945) DataNodes with zero capacity and zero blocks should be decommissioned immediately

ASF GitHub Bot (Jira) Mon, 05 Apr 2021 22:11:05 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-15945?focusedWorklogId=577323&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-577323
 ]


ASF GitHub Bot logged work on HDFS-15945:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Apr/21 05:10
            Start Date: 06/Apr/21 05:10
    Worklog Time Spent: 10m 
      Work Description: virajjasani commented on a change in pull request #2854:
URL: https://github.com/apache/hadoop/pull/2854#discussion_r607508681



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
##########
@@ -4584,8 +4584,14 @@ void processExtraRedundancyBlocksOnInService(
    */
   boolean isNodeHealthyForDecommissionOrMaintenance(DatanodeDescriptor node) {
     if (!node.checkBlockReportReceived()) {
-      LOG.info("Node {} hasn't sent its first block report.", node);
-      return false;
+      if (node.getCapacity() == 0 && node.getNumBlocks() == 0) {

Review comment:
       > Oh, after thinking about it, it doesn't matter what the capacity is, 
it may be considered safe to decommission if the numBlocks is 0.
   
   Exactly. Thanks @tasanuma 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 577323)
    Time Spent: 1h 10m  (was: 1h)

> DataNodes with zero capacity and zero blocks should be decommissioned 
> immediately
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-15945
>                 URL: https://issues.apache.org/jira/browse/HDFS-15945
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Takanobu Asanuma
>            Assignee: Takanobu Asanuma
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Such as when there is a storage problem, DataNode capacity and block count 
> sometimes become zero.
>  When we tried to decommission those DataNodes, we ran into an issue that the 
> decommission did not complete because the NameNode had not received their 
> first block report.
> {noformat}
> INFO  blockmanagement.DatanodeAdminManager 
> (DatanodeAdminManager.java:startDecommission(183)) - Starting decommission of 
> 127.0.0.1:58343 
> [DISK]DS-a29de094-2b19-4834-8318-76cda3bd86bf:NORMAL:127.0.0.1:58343 with 0 
> blocks
> INFO  blockmanagement.BlockManager 
> (BlockManager.java:isNodeHealthyForDecommissionOrMaintenance(4587)) - Node 
> 127.0.0.1:58343 hasn't sent its first block report.
> INFO  blockmanagement.DatanodeAdminDefaultMonitor 
> (DatanodeAdminDefaultMonitor.java:check(258)) - Node 127.0.0.1:58343 isn't 
> healthy. It needs to replicate 0 more blocks. Decommission In Progress is 
> still in progress.
> {noformat}
> To make matters worse, even if we stopped these DataNodes afterward, they 
> remained in a dead&decommissioning state until NameNode restarted.
> I think those DataNodes should be decommissioned immediately even if NameNode 
> hasn't recived the first block report.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HDFS-15945) DataNodes with zero capacity and zero blocks should be decommissioned immediately

Reply via email to