[
https://issues.apache.org/jira/browse/HDDS-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
mingchao zhao updated HDDS-2696:
--------------------------------
Target Version/s: 1.4.0 (was: 1.3.0)
> Document recovery from RATIS-677
> --------------------------------
>
> Key: HDDS-2696
> URL: https://issues.apache.org/jira/browse/HDDS-2696
> Project: Apache Ozone
> Issue Type: Improvement
> Components: Ozone Datanode
> Reporter: István Fajth
> Priority: Critical
> Labels: Triaged
>
> As RATIS-677 is solved in a way where a setting needs to be changed, and set
> for the RatisServer implementation to ignore the corruption, and at the
> moment due to HDDS-2647, we do not have a clear recovery path from a ratis
> corruption in the pipeline data.
> We should document how this can be recovered. I have an idea which includes
> closing the pipeline in SCM and remove the ratis metadata for the pipeline in
> the DataNodes, which effectively clears out the corrupted pipeline from the
> system.
> There are two problems I have with finding a recovery path, and document it:
> - I am not sure if we have strong enough guarantees that the writes happened
> properly if the ratis metadata could become corrupt so this needs to be
> investigated.
> - At the moment I can not validate this approach, as if I do the steps (stop
> the 3 DN, move out ratis data for pipeline, close the pipeline with scmcli,
> then restart the DNs) the pipeline is not closed properly, and SCM fails as
> described in HDDS-2695
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]