[
https://issues.apache.org/jira/browse/HDDS-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HDDS-7935:
---------------------------------
Labels: pull-request-available (was: )
> [Snapshot] LRU Cache entries may get evicted/closed during long running
> processes
> ---------------------------------------------------------------------------------
>
> Key: HDDS-7935
> URL: https://issues.apache.org/jira/browse/HDDS-7935
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: George Jahad
> Assignee: Siyao Meng
> Priority: Major
> Labels: pull-request-available
>
> The way the snapshot LRU cache is implemented, when the oldest snapshot is
> evicted, the corresponding rocksdb instance is closed:
> https://github.com/apache/ozone/blob/3f7ded2a34c0c35b89901e222ceaee0d1fdf08b6/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java#L124
> That is probably fine for shortlived tasks like users reading snapshots, but
> is probably not safe for long lived tasks like snap diff and maybe snapshot
> delete.
> The problem is that the cache is currently only refreshed when the snapshot
> is initially retrieved from the cache; subsequent reads from the snapshot
> itself don't refresh the cache. Thus it is possible for rocksdb instances to
> be evicted and closed in the middle of snap diff processing.
> One alternative I can think of is to add some kind of reference counting
> scheme so that rocksdb instances aren't closed automatically on eviction.
> Another possibility is to have an entirely separate pool of snapshot entries,
> outside of the cache, that are explicitly opened and closed by long running
> tasks like snapdiff.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]