[GitHub] [ozone] smengcl opened a new pull request, #4568: HDDS-7935. [Snapshot] LRU Cache entries may get evicted/closed during long running processes

via GitHub Fri, 14 Apr 2023 08:07:36 -0700


smengcl opened a new pull request, #4568:
URL: https://github.com/apache/ozone/pull/4568


   ## What changes were proposed in this pull request?
   
   EXPERIMENTAL.
   
   This is approach 2 of 2 that might fix the issue that bothers SnapDiff, 
where the current `LoadingCache` behaves like a simple LRU cache. We had no 
control over when an `OmSnapshot` instance can be evicted and closed, which can 
cause the the snapshot DB instance to be **closed prematurely** while SnapDiff 
is still running in the background, crashing the OM.
   
   For approach 1 that implements a custom `SnapshotCache` and the whole 
modified-LRU logic from scratch, see #4567 .
   
   This approach 2 replaces the hard-limit (`.maximumSize()`) with 
`.weakValues()`. This allows JVM garbage collector to collect the value when 
they are no longer strongly referenced, for instance from SnapDiff or Hadoop FS 
API read operations.
   
   The `ozone.om.snapshot.cache.max.size` effectively becomes a soft limit (the 
same as approach 1), with warning printed in `checkForSnapshot()` when cache 
size exceeds the soft limit.
   
   This is not fully tested yet. This is much cleaner than approach 1 if this 
works as expected. In the worst case, we fall back to approach 1.
   
   cc @GeorgeJahad 
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-7935
   
   ## How was this patch tested?
   
   - All existing test should pass.
   - Pending SnapDiff test additions that intentionally exceeds the cache limit.
     - Possibly new test cases that triggers GC while SnapDiff is still running 
to see if it can still finish without crashing OM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [ozone] smengcl opened a new pull request, #4568: HDDS-7935. [Snapshot] LRU Cache entries may get evicted/closed during long running processes

Reply via email to