[
https://issues.apache.org/jira/browse/HDDS-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wei-Chiu Chuang updated HDDS-11452:
-----------------------------------
Parent: HDDS-13747 (was: HDDS-12940)
> OmSnapshotPurgeRequest is not atomic and can lead to SnapshotChain Corruption
> -----------------------------------------------------------------------------
>
> Key: HDDS-11452
> URL: https://issues.apache.org/jira/browse/HDDS-11452
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Swaminathan Balachandran
> Assignee: Swaminathan Balachandran
> Priority: Major
> Labels: pull-request-available
>
> OmSnapshotPurgeRequest updates the snapshot chain and also updates the cache
> & in case of any failure these changes are not rolled back. In case of
> checked exception thrown(This could be any exception ranging from proto
> exception or any random IOException), the request gobbles up the exception
> and returns an error response. The problem with this is, we have partially
> updated snapshot info table cache which is not coherrent with the snapshot
> chain and all these changes won't be flushed to disk. On restart this could
> lead to all sorts of snapshot chain & snapshot info corruption.
> The proposal here is to make the entire request atomic:
> 1) Update the snapshot chain & maintain the updated snapshot infos in local
> uncommitted space.
> 2) In case of an exception, roll back all deleted snapshots by putting it
> back to the snapshot chain(P.S. this needs to be done in the reverse order of
> removal) & return an error response.
> 3) If no exception is thrown, update the snapshot info table cache.
> 4) Send it to double buffer
> cc: [~hemantk] [~ppogde]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]