GlenGeng opened a new pull request #1700:
URL: https://github.com/apache/ozone/pull/1700


   ## What changes were proposed in this pull request?
   
   **Problem:** 
   ReplicationManager maintains the in-flight replication and deletion 
in-memory, which is not replicated using Ratis. So, theoretically it’s possible 
that we might run into issues if we immediately start ReplicationManager after 
a failover.
   Scenario: There are 6 replicas of the container C1 namely CR1, CR2, CR3, 
CR4, CR5 and CR6. The container is over replicated, so the current SCM S1 
decides to delete the excess replicas. SCM S1 picks CR1, CR2 and CR3 for 
deletion, this information is updated in the in-flight deletion list and 
deletion commands are sent to the datanodes. If there is a failover at this 
point and SCM S2 becomes leader, it doesn’t have the in-flight deletion list 
from SCM S1 and it finds the container C1 to be over replicated. Theoretically 
it’s possible that SCM S2 picks CR4, CR5 and CR6 for deletion. If this happens, 
we will end up in data loss.
   To address this issue we will make the logic to select a replica for 
deletion deterministic. This will make sure that the new leader after failover 
will pick the same replica for deletion which was picked by the old leader. 
   
   **Approach:** 
   Sort the candidate replicas. Delete excess replicas from small to large. 
There will not be any DeleteContainerCommand for the largest 3 Replica sent by 
any SCM.
   
   **Example:**
   Assume there are 6 replicas of the container C1, the factor of C1 is 3, the 
names of the replicas are CR1, CR2, CR3, CR4, CR5 and CR6. There will not be 
any DeleteContainerCommand for CR4, CR5, CR6 sent by any SCM:
   If SCM sees less than or equal to 3 replicas, it won’t send any 
DeleteContainerCommand.
   If SCM sees 4 replicas, It deletes the smallest replica, which means it 
won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a 
contradiction.
   If SCM sees 5 replicas, It deletes the smallest 2 replicas, which means it 
won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a 
contradiction.
   If SCM sees 6 replicas, It deletes the smallest 3 replicas, which means it 
won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a 
contradiction.
   
   **P.S.:**
   Since this issue exists in master as well, e.g., a quickly restart of SCM, 
we decide to fix this problem in master. 
   
   ## What is the link to the Apache JIRA
   
   (Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HDDS-XXXX. Fix a typo in YYY.)
   
   Please replace this section with the link to the Apache JIRA)
   
   ## How was this patch tested?
   
   (Please explain how this patch was tested. Ex: unit tests, manual tests)
   (If this patch involves UI changes, please attach a screen-shot; otherwise, 
remove this)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to