[
https://issues.apache.org/jira/browse/HDDS-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
István Fajth resolved HDDS-2696.
--------------------------------
Resolution: Won't Fix
I don't think I have seen this problem arise again, however, the problem in
HDDS-2647 should be evaluated, as that might still be true and should be fixed
independently from this JIRA.
For this one, I am closing it and if we see any other case of this issue, we
might re-open.
> Document recovery from RATIS-677
> --------------------------------
>
> Key: HDDS-2696
> URL: https://issues.apache.org/jira/browse/HDDS-2696
> Project: Apache Ozone
> Issue Type: Improvement
> Components: Ozone Datanode
> Reporter: István Fajth
> Assignee: István Fajth
> Priority: Critical
> Labels: Triaged
>
> As RATIS-677 is solved in a way where a setting needs to be changed, and set
> for the RatisServer implementation to ignore the corruption, and at the
> moment due to HDDS-2647, we do not have a clear recovery path from a ratis
> corruption in the pipeline data.
> We should document how this can be recovered. I have an idea which includes
> closing the pipeline in SCM and remove the ratis metadata for the pipeline in
> the DataNodes, which effectively clears out the corrupted pipeline from the
> system.
> There are two problems I have with finding a recovery path, and document it:
> - I am not sure if we have strong enough guarantees that the writes happened
> properly if the ratis metadata could become corrupt so this needs to be
> investigated.
> - At the moment I can not validate this approach, as if I do the steps (stop
> the 3 DN, move out ratis data for pipeline, close the pipeline with scmcli,
> then restart the DNs) the pipeline is not closed properly, and SCM fails as
> described in HDDS-2695
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]