[
https://issues.apache.org/jira/browse/HDDS-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nanda kumar resolved HDDS-4589.
-------------------------------
Resolution: Fixed
> Handle potential data loss during
> ReplicationManager.handleOverReplicatedContainer()
> ------------------------------------------------------------------------------------
>
> Key: HDDS-4589
> URL: https://issues.apache.org/jira/browse/HDDS-4589
> Project: Hadoop Distributed Data Store
> Issue Type: Sub-task
> Components: SCM HA
> Affects Versions: 1.1.0
> Reporter: Glen Geng
> Assignee: Glen Geng
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.1.0
>
>
> h4. Problem:
> ReplicationManager maintains the in-flight replication and deletion
> in-memory, which is not replicated using Ratis. So, theoretically it’s
> possible that we might run into issues if we immediately start
> ReplicationManager after a failover.
> Scenario: There are 6 replicas of the container C1 namely CR1, CR2, CR3, CR4,
> CR5 and CR6. The container is over replicated, so the current SCM S1 decides
> to delete the excess replicas. SCM S1 picks CR1, CR2 and CR3 for deletion,
> this information is updated in the in-flight deletion list and deletion
> commands are sent to the datanodes. If there is a failover at this point and
> SCM S2 becomes leader, it doesn’t have the in-flight deletion list from SCM
> S1 and it finds the container C1 to be over replicated. Theoretically it’s
> possible that SCM S2 picks CR4, CR5 and CR6 for deletion. If this happens, we
> will end up in data loss.
> To address this issue we will make the logic to select a replica for deletion
> deterministic. This will make sure that the new leader after failover will
> pick the same replica for deletion which was picked by the old leader.
> h4. Approach:
> Sort the candidate replicas. Delete excess replicas from small to large.
> There will not be any DeleteContainerCommand for the largest 3 Replica sent
> by any SCM.
> h4. Example:
> Assume there are 6 replicas of the container C1, the factor of C1 is 3, the
> names of the replicas are CR1, CR2, CR3, CR4, CR5 and CR6. There will not be
> any DeleteContainerCommand for CR4, CR5, CR6 sent by any SCM:
> If SCM sees less than or equal to 3 replicas, it won’t send any
> DeleteContainerCommand.
> If SCM sees 4 replicas, It deletes the smallest replica, which means it won’t
> send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a
> contradiction.
> If SCM sees 5 replicas, It deletes the smallest 2 replicas, which means it
> won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a
> contradiction.
> If SCM sees 6 replicas, It deletes the smallest 3 replicas, which means it
> won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a
> contradiction.
> h4. P.S.:
> Since this issue exists in master as well, e.g., a quickly restart of SCM, we
> decide to fix this problem in master.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]