[
https://issues.apache.org/jira/browse/HDDS-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephen O'Donnell resolved HDDS-5267.
-------------------------------------
Fix Version/s: 1.3.0
Resolution: Fixed
> Full Container Report can remove replicas added by an Incremental Report
> ------------------------------------------------------------------------
>
> Key: HDDS-5267
> URL: https://issues.apache.org/jira/browse/HDDS-5267
> Project: Apache Ozone
> Issue Type: Bug
> Components: Ozone Datanode, SCM
> Affects Versions: 1.1.0
> Reporter: Stephen O'Donnell
> Assignee: Ritesh H Shukla
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.3.0
>
>
> In HDDS-5249, I highlighted an issue between Incremental and Full container
> reports. This follow-up Jira is to trace the second problem mentioned in
> that Jira.
> After HDDS-5249, the report processing for a given DN on SCM in synchronised
> so only 1 report can process at a time for a given DN.
> We can still have the following scenario:
> 1. FCR generated on DN, including containers up to ID 1000.
> 2. At the same time ICR generated on DN for container 1001.
> 3. The ICR is processed first on SCM, adding 1001.
> 4. The FCR is processed, and this will cause the reference to 1001 to be
> removed as it is not in the FCR.
> 5. About 60 - 90 seconds later another FCR will be generated which will
> correct the issue.
> As things stand, there is no locking on the DN to ensure that a FCR and ICR
> cannot be generated at the same time.
> There is also no way to know that a given ICR is contained in a given FCR or
> not.
> One way to fix this problem, would be:
> 1. Introduce some locking in the DN to ensure that FCR, ICR and new container
> creation are serialized.
> 2. Introduce an increasing sequence number which is assigned to each FCR and
> ICR. If a report has a greater sequence than another one, then it supersedes
> the small one.
> Eg:
> ICR #seq=100, container=1001, FCR #seq=99. In this case, the FCR will not
> have container 1001.
> ICR #seq=99, container=1001, FCR #seq=100. In this case, the FCR is
> guaranteed to have container 1001
> Then we need to figure out a way on the DNs to use this information. One way,
> would be attaching the report sequence number to each replica, and only
> remove it if the sequence is less than the current report sequence. However
> that would add some memory overhead to SCM, so it is worth looking into
> alternatives.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]