[ 
https://issues.apache.org/jira/browse/HDDS-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated HDDS-4589:
----------------------------
    Description: 
h4. Problem: 

ReplicationManager maintains the in-flight replication and deletion in-memory, 
which is not replicated using Ratis. So, theoretically it’s possible that we 
might run into issues if we immediately start ReplicationManager after a 
failover.

Scenario: There are 6 replicas of the container C1 namely CR1, CR2, CR3, CR4, 
CR5 and CR6. The container is over replicated, so the current SCM S1 decides to 
delete the excess replicas. SCM S1 picks CR1, CR2 and CR3 for deletion, this 
information is updated in the in-flight deletion list and deletion commands are 
sent to the datanodes. If there is a failover at this point and SCM S2 becomes 
leader, it doesn’t have the in-flight deletion list from SCM S1 and it finds 
the container C1 to be over replicated. Theoretically it’s possible that SCM S2 
picks CR4, CR5 and CR6 for deletion. If this happens, we will end up in data 
loss.

To address this issue we will make the logic to select a replica for deletion 
deterministic. This will make sure that the new leader after failover will pick 
the same replica for deletion which was picked by the old leader. 
h4. Approach: 

Sort the candidate replicas. Delete excess replicas from small to large. There 
will not be any DeleteContainerCommand for the largest 3 Replica sent by any 
SCM.
h4. Example:

Assume there are 6 replicas of the container C1, the factor of C1 is 3, the 
names of the replicas are CR1, CR2, CR3, CR4, CR5 and CR6. There will not be 
any DeleteContainerCommand for CR4, CR5, CR6 sent by any SCM:

If SCM sees less than or equal to 3 replicas, it won’t send any 
DeleteContainerCommand.

If SCM sees 4 replicas, It deletes the smallest replica, which means it won’t 
send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a 
contradiction.

If SCM sees 5 replicas, It deletes the smallest 2 replicas, which means it 
won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a 
contradiction.

If SCM sees 6 replicas, It deletes the smallest 3 replicas, which means it 
won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a 
contradiction.
h4. P.S.:

Since this issue exists in master as well, e.g., a quickly restart of SCM, we 
decide to fix this problem in master. 

 

  was:
 

ReplicationManager maintains the in-flight replication and deletion in-memory, 
which is not replicated using Ratis. So, theoretically it’s possible that we 
might run into issues if we immediately start ReplicationManager after a 
failover.

Scenario: There are 6 replicas of the container C1 namely CR1, CR2, CR3, CR4, 
CR5 and CR6. The container is over replicated, so the current SCM S1 decides to 
delete the excess replicas. SCM S1 picks CR1, CR2 and CR3 for deletion, this 
information is updated in the in-flight deletion list and deletion commands are 
sent to the datanodes. If there is a failover at this point and SCM S2 becomes 
leader, it doesn’t have the in-flight deletion list from SCM S1 and it finds 
the container C1 to be over replicated. Theoretically it’s possible that SCM S2 
picks CR4, CR5 and CR6 for deletion. If this happens, we will end up in data 
loss.

To address this issue we will make the logic to select a replica for deletion 
deterministic. This will make sure that the new leader after failover will pick 
the same replica for deletion which was picked by the old leader.

 
h4. Approach: 

Sort the candidate replicas. Delete excess replicas from small to large. There 
will not be any DeleteContainerCommand for the largest 3 Replica sent by any 
SCM.
h4. Example:

Assume there are 6 replicas of the container C1, the factor of C1 is 3, the 
names of the replicas are CR1, CR2, CR3, CR4, CR5 and CR6. There will not be 
any DeleteContainerCommand for CR4, CR5, CR6 sent by any SCM:

If SCM sees less than or equal to 3 replicas, it won’t send any 
DeleteContainerCommand.

If SCM sees 4 replicas, It deletes the smallest replica, which means it won’t 
send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a 
contradiction.

If SCM sees 5 replicas, It deletes the smallest 2 replicas, which means it 
won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a 
contradiction.

If SCM sees 6 replicas, It deletes the smallest 3 replicas, which means it 
won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a 
contradiction.

 

Since this issue exists in master as well, e.g., a quickly restart of SCM, we 
decide to fix this problem in master. 

 


> Tackle potential data loss during 
> ReplicationManager.handleOverReplicatedContainer()
> ------------------------------------------------------------------------------------
>
>                 Key: HDDS-4589
>                 URL: https://issues.apache.org/jira/browse/HDDS-4589
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: SCM HA
>    Affects Versions: 1.1.0
>            Reporter: Glen Geng
>            Assignee: Glen Geng
>            Priority: Major
>             Fix For: 1.1.0
>
>
> h4. Problem: 
> ReplicationManager maintains the in-flight replication and deletion 
> in-memory, which is not replicated using Ratis. So, theoretically it’s 
> possible that we might run into issues if we immediately start 
> ReplicationManager after a failover.
> Scenario: There are 6 replicas of the container C1 namely CR1, CR2, CR3, CR4, 
> CR5 and CR6. The container is over replicated, so the current SCM S1 decides 
> to delete the excess replicas. SCM S1 picks CR1, CR2 and CR3 for deletion, 
> this information is updated in the in-flight deletion list and deletion 
> commands are sent to the datanodes. If there is a failover at this point and 
> SCM S2 becomes leader, it doesn’t have the in-flight deletion list from SCM 
> S1 and it finds the container C1 to be over replicated. Theoretically it’s 
> possible that SCM S2 picks CR4, CR5 and CR6 for deletion. If this happens, we 
> will end up in data loss.
> To address this issue we will make the logic to select a replica for deletion 
> deterministic. This will make sure that the new leader after failover will 
> pick the same replica for deletion which was picked by the old leader. 
> h4. Approach: 
> Sort the candidate replicas. Delete excess replicas from small to large. 
> There will not be any DeleteContainerCommand for the largest 3 Replica sent 
> by any SCM.
> h4. Example:
> Assume there are 6 replicas of the container C1, the factor of C1 is 3, the 
> names of the replicas are CR1, CR2, CR3, CR4, CR5 and CR6. There will not be 
> any DeleteContainerCommand for CR4, CR5, CR6 sent by any SCM:
> If SCM sees less than or equal to 3 replicas, it won’t send any 
> DeleteContainerCommand.
> If SCM sees 4 replicas, It deletes the smallest replica, which means it won’t 
> send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a 
> contradiction.
> If SCM sees 5 replicas, It deletes the smallest 2 replicas, which means it 
> won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a 
> contradiction.
> If SCM sees 6 replicas, It deletes the smallest 3 replicas, which means it 
> won’t send DeleteContainerCommand for CR4 CR5, CR6, otherwise there will be a 
> contradiction.
> h4. P.S.:
> Since this issue exists in master as well, e.g., a quickly restart of SCM, we 
> decide to fix this problem in master. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to