Re: [I] Consider adding caching to more ID objects. [accumulo]

via GitHub Fri, 08 Mar 2024 09:04:36 -0800


ctubbsii commented on issue #4316:
URL: https://github.com/apache/accumulo/issues/4316#issuecomment-1986075875


   The purpose of the existing caches is de-duplication, not rapid on-demand 
access, like we think of most caches. When these IDs were just String types, we 
could use something similar to String.intern, but as a concrete type, we need 
the weak values cache to de-duplicate the wrapper object around the String 
canonical form. Expiring due to size or time would only cause duplicates to 
start appearing, which defeats the purpose of the cache. Sure, it would keep 
the cache smaller, but the objects already get removed from the cache when they 
are garbage collected, and removing the reference in the cache isn't going to 
remove the memory used by the object that's still in the JVM. I believe we 
already have unit tests to make sure the weak values caching mechanism is 
working correctly with garbage collection.
   
   So, for the existing caches, I don't think any change needs to be made.
   
   That said, this ticket is about caching more types, not necessarily changing 
the existing caching. In that context, it's important to note that the 
de-duplication caches really only benefit objects that are frequently 
duplicated (regardless of cardinality, though types with low-cardinality 
relative to the frequency of object creation will benefit the most from 
de-duplication). These typically come from instantiation of the wrapper 
concrete type when deserializing the canonical representation from user input, 
ZooKeeper, the metadata table, or similar. Things like TableId and NamespaceId 
clearly benefit from these. However, not all of these ID types are prone to 
duplication due to deserialization of their canonical forms. Each one would 
have to be evaluated to determine if they are subject to frequent duplication, 
and would benefit from a de-duplication cache, on a case-by-case basis.
   
   It's also possible that some of these would benefit from a different kind of 
cache that might have something like time or size limits, but again, that would 
require each type to be evaluated on a case-by-case basis in the context of 
that cache's purpose.
   
   If a cache is added, it should be done because it's clear that it would 
benefit from a the cache being added, whether it's a de-duplication cache to 
limit memory usage for frequently deserialized IDs with low cardinality, or if 
it's a cache for some other purpose.
   
   Right now, I'm not sure we have any reason to believe any of these other ID 
types would benefit from additional caching. So, we could probably just close 
this issue, or somebody could dive in and do some experimentation and analysis 
to see if the other types would benefit from a cache. I suspect that it's 
unlikely they will benefit, but have not done the analysis myself, so it's only 
a suspicion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Consider adding caching to more ID objects. [accumulo]

Reply via email to