[jira] [Commented] (HDFS-14644) That replication of block failed leads to decommission is blocked when the number of replicas of block is greater than the number of datanode

Stephen O'Donnell (JIRA) Fri, 12 Jul 2019 01:14:52 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883610#comment-16883610
 ]


Stephen O'Donnell commented on HDFS-14644:
------------------------------------------

I agree this is expected behaviour, as by definition decommission must ensure 
all blocks on the node reach their target replication before it can complete. 
If there are not enough nodes remaining (that also meet the placement policy), 
then it would not be possible to reach that goal and decommission will never 
complete. This is similar to decommissioning one node of a 3 node cluster which 
will never make progress.

I agree we could do better on this somehow, as we check all blocks on the host 
already and we could know the number of live nodes. If the target replicas are 
more than the live nodes, then warn somehow, but it may be more tricky to check 
the placement policy too, but that is less likely to be the issue.

This is somewhat related to something I have been thinking about for 
maintenance mode. It would be great to have a command that lets you test how 
many blocks would need to be replicated before maintenance mode can be entered 
for a given set of nodes. The same sort of thing could check how many blocks 
are impossible to replicate for a given set of nodes decommissioning. Aside 
from the idea, I have not thought about how to implement this yet.

> That replication of block failed leads to decommission is blocked when the 
> number of replicas of block is greater than the number of datanode
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-14644
>                 URL: https://issues.apache.org/jira/browse/HDFS-14644
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 3.1.1, 2.9.2, 3.0.3, 2.8.5, 2.7.7
>            Reporter: Lisheng Sun
>            Priority: Major
>
> 2019-07-10,15:37:18,028 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 5 to reach 10 
> (unavailableStorages=[DISK, ARCHIVE], 
> storagePolicy=BlockStoragePolicy\{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) All 
> required storage types are unavailable: unavailableStorages=[DISK, ARCHIVE], 
> storagePolicy=BlockStoragePolicy\{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
> 2019-07-10,15:37:18,028 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 5 to reach 10 
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy\{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-14644) That replication of block failed leads to decommission is blocked when the number of replicas of block is greater than the number of datanode

Reply via email to