sodonnel opened a new pull request, #4118:
URL: https://github.com/apache/ozone/pull/4118

   ## What changes were proposed in this pull request?
   
   In EC, a container is considered missing and under replicated if it has lost 
enough replicas that offline reconstruction is not possible. If any of the 
remaining replicas for this container are on a datanode that is being 
decommissioned, the decommissioning will not proceed. All the containers on 
that node must be restored to proper replication for it to finish 
decommissioning, but the code will not copy the replica of the missing 
container to a different node.
   
   There are 3 parts to fixing this problem:
   
   In DatanodeAdminMonitorImpl, inside the method 
checkContainersReplicatedOnNode, we use a call to 
ECContainerReplicaCount.isSufficientlyReplicated() to decide if the container 
is replicated ok or not. Even if we address 1 and 2 above, this is still a 
problem, as the container is un-recoverable. For EC container in the 
decommission monitor, perhaps we need a different check. Ie, that for the 
replica on the host being checked, it is also available on another IN_SERVICE 
host. From a decommission point of view, we don't care if the entire EC 
container is sufficiently replicated or not - we just care that the replica on 
the current host has a copy elsewhere.
   
   In ECReplicationCheckHandler, we deliberately skip adding "unrecoverable" 
containers to the under replicated queue as we previously believed there was no 
point in adding them. They cannot be recovered anyway. However this 
decommission issue is specific to EC, so we should allow the container to make 
it onto the under-replicated queue if it has decommissioning or maintenance 
indexes.
   
   In ECUnderReplicationHandle we need to check that the decommissioning 
indexes are copied ok, even if the container is otherwise unrecoverable.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-7666
   
   ## How was this patch tested?
   
   New unit tests added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to