George Jahad created HDDS-7935:
----------------------------------

             Summary: LRU Cache entries may get evicted/closed during long 
running processes
                 Key: HDDS-7935
                 URL: https://issues.apache.org/jira/browse/HDDS-7935
             Project: Apache Ozone
          Issue Type: Sub-task
            Reporter: George Jahad


The way the snapshot LRU cache is implemented, when the oldest snapshot is 
evicted, the corresponding rocksdb instance is closed: 
https://github.com/apache/ozone/blob/3f7ded2a34c0c35b89901e222ceaee0d1fdf08b6/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java#L124

That is probably fine for shortlived tasks like users reading snapshots, but is 
probably not safe for long lived tasks like snap diff and maybe snapshot delete.

The problem is that the cache is currently only refreshed when the snapshot is 
initially retrieved from the cache; subsequent reads from the snapshot itself 
don't refresh the cache.  Thus it is possible for rocksdb instances to be 
evicted and closed in the middle of snap diff processing.

One alternative I can think of is to add some kind of reference counting scheme 
so that rocksdb instances aren't closed automatically on eviction.

Another possibility is to have an entirely separate pool of snapshot entries, 
outside of the cache, that are explicitly opened and closed by long running 
tasks like snapdiff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to