[ 
https://issues.apache.org/jira/browse/HDDS-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YI-CHEN WANG reassigned HDDS-4599:
----------------------------------

    Assignee: YI-CHEN WANG

> Handle inflight delete/add actions in ReplicationManager properly.
> ------------------------------------------------------------------
>
>                 Key: HDDS-4599
>                 URL: https://issues.apache.org/jira/browse/HDDS-4599
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: SCM HA
>    Affects Versions: 1.1.0
>            Reporter: Glen Geng
>            Assignee: YI-CHEN WANG
>            Priority: Major
>
> ReplicationManager maintains the in-flight replication and deletion 
> in-memory, which is not replicated using Ratis. So, theoretically it’s 
> possible that we might run into data loss issues and over replicated issues 
> if we immediately start ReplicationManager after a failover.
> There is a quick fix for the potential data loss issue HDDS-4589, however we 
> need a thorough solution for both in-flight add and in-flight delete.
> We have two proposals from [~sodonnell]:
>  # have the DNs provide a list of pending_delete blocks in their container 
> report / heartbeat, and then we can use that in SCM.
>  # if the DNs detect a new master SCM or a restarted SCM, then purge their 
> pending delete list and wait for new instructions from the new/restarted SCM.
> File this Jira to record this problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to