[
https://issues.apache.org/jira/browse/HDDS-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Swaminathan Balachandran updated HDDS-11452:
--------------------------------------------
Description:
OmSnapshotPurgeRequest updates the snapshot chain and also updates the cache &
in case of any failure. In case of checked exception thrown, the request
gobbles up the exception and returns an error response. The problem with this
is, we have partially updated snapshot info table cache which is not coherrent
with the snapshot chain and this won't be flushed to disk. On restart this
could lead to all sorts of snapshot chain & snapshot info corruption.
The proposal here is to make the entire request atomic:
1) Update the snapshot chain & maintain the updated snapshot infos in local
uncommitted space.
2) In case of an exception, roll back all deleted snapshots from chain by
putting it back to the snapshot chain & return an error response.
3) If no exception is thrown, update the snapshot info table cache.
4) Send it to double buffer
cc: [~hemantk] [~ppogde]
was:
OmSnapshotPurgeRequest updates the snapshot chain and also updates the cache &
in case of any failure. In case of checked exception thrown, the request
gobbles up the exception and returns an error response. But the problem with
this is, we have partially updated snapshot info table cache which is not
coherrent with the snapshot chain and this won't be flushed to disk. On restart
this could lead to all sorts of snapshot chain & snapshot info corruption.
The proposal here is to make the entire request atomic:
1) Update the snapshot chain & maintain the updated snapshot infos in local
uncommitted space.
2) In case of an exception, roll back all deleted snapshots from chain by
putting it back to the snapshot chain & return an error response.
3) If no exception is thrown, update the snapshot info table cache.
4) Send it to double buffer
cc: [~hemantk] [~ppogde]
> OmSnapshotPurgeRequest is not atomic and can lead to SnapshotChain Corruption
> -----------------------------------------------------------------------------
>
> Key: HDDS-11452
> URL: https://issues.apache.org/jira/browse/HDDS-11452
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Swaminathan Balachandran
> Assignee: Swaminathan Balachandran
> Priority: Major
>
> OmSnapshotPurgeRequest updates the snapshot chain and also updates the cache
> & in case of any failure. In case of checked exception thrown, the request
> gobbles up the exception and returns an error response. The problem with this
> is, we have partially updated snapshot info table cache which is not
> coherrent with the snapshot chain and this won't be flushed to disk. On
> restart this could lead to all sorts of snapshot chain & snapshot info
> corruption.
> The proposal here is to make the entire request atomic:
> 1) Update the snapshot chain & maintain the updated snapshot infos in local
> uncommitted space.
> 2) In case of an exception, roll back all deleted snapshots from chain by
> putting it back to the snapshot chain & return an error response.
> 3) If no exception is thrown, update the snapshot info table cache.
> 4) Send it to double buffer
> cc: [~hemantk] [~ppogde]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]