siddhantsangwan commented on PR #4227:
URL: https://github.com/apache/ozone/pull/4227#issuecomment-1425830248

   > When checking for over replication of healthy replicas, we have a stricter 
check in RatisReplicationCheckHandler:
   > 
   >       if (!overHealth.isReplicatedOkAfterPending() &&
   >           !overHealth.hasMismatchedReplicas()) {
   >         request.getReplicationQueue().enqueue(overHealth);
   >       }
   > This means we don't enqueue at all if there are some mismatched replicas. 
This is the same point (number 5) that I put in the description above. This 
behaviour is different from Legacy but I couldn't come up with a good way to 
check for sufficient number of matching replicas. Legacy checks this by 
constructing RatisContainerReplicaCount with matching replicas only. I tried to 
do that here (in hasSufficientMatchingReplicas that I forgot to delete) but 
realised this can be wrong because the pending ops in that object will include 
ops on other replicas also.
   > 
   > For example, consider a CLOSED container with the following replicas:
   > CLOSED, CLOSED, CLOSED, UNHEALTHY
   > Suppose Legacy sent a delete command for the UNHEALTHY replica, so we have 
a pending delete. In the next RM iteration, if we construct 
RatisContainerReplicaCount with only the matching replicas, we will still have 
a pending delete being counted. This will cause the container to be seen as 
under replicated (3 matching replicas - 1 delete). Any ideas how this can be 
done?
   > 
   > The other place where we're queuing for over rep is in 
RatisUnhealthyReplicationCheckHandler. verifyPerfectReplication checks perfect 
replication with just the healthy replicas. I think you're right, we should be 
checking only with matching replicas here like legacy does in 
handleOverReplicatedExcessUnhealthy. It then becomes the same problem as above. 
Do you think it's ok to queue them here as long as RatisOverReplicationHandler 
tries to delete unhealthy replicas first?
   
   I've tried to resolve this situation. RatisReplicationCheckHandler will 
still not queue for over rep if we have any mismatched replicas. 
RatisUnhealthyReplicationCheckHandler will check if we have sufficient number 
of matching replicas (isSafelyOverReplicated()).
   
   
   Fixed all the other comments except refactoring RatisContainerReplicaCount. 
This PR is huge as is - we can refactor in an improvement jira if required.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to