smengcl opened a new pull request, #4568: URL: https://github.com/apache/ozone/pull/4568
## What changes were proposed in this pull request? EXPERIMENTAL. This is approach 2 of 2 that might fix the issue that bothers SnapDiff, where the current `LoadingCache` behaves like a simple LRU cache. We had no control over when an `OmSnapshot` instance can be evicted and closed, which can cause the the snapshot DB instance to be **closed prematurely** while SnapDiff is still running in the background, crashing the OM. For approach 1 that implements a custom `SnapshotCache` and the whole modified-LRU logic from scratch, see #4567 . This approach 2 replaces the hard-limit (`.maximumSize()`) with `.weakValues()`. This allows JVM garbage collector to collect the value when they are no longer strongly referenced, for instance from SnapDiff or Hadoop FS API read operations. The `ozone.om.snapshot.cache.max.size` effectively becomes a soft limit (the same as approach 1), with warning printed in `checkForSnapshot()` when cache size exceeds the soft limit. This is not fully tested yet. This is much cleaner than approach 1 if this works as expected. In the worst case, we fall back to approach 1. cc @GeorgeJahad ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-7935 ## How was this patch tested? - All existing test should pass. - Pending SnapDiff test additions that intentionally exceeds the cache limit. - Possibly new test cases that triggers GC while SnapDiff is still running to see if it can still finish without crashing OM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
