ctubbsii commented on issue #4316: URL: https://github.com/apache/accumulo/issues/4316#issuecomment-1986075875
The purpose of the existing caches is de-duplication, not rapid on-demand access, like we think of most caches. When these IDs were just String types, we could use something similar to String.intern, but as a concrete type, we need the weak values cache to de-duplicate the wrapper object around the String canonical form. Expiring due to size or time would only cause duplicates to start appearing, which defeats the purpose of the cache. Sure, it would keep the cache smaller, but the objects already get removed from the cache when they are garbage collected, and removing the reference in the cache isn't going to remove the memory used by the object that's still in the JVM. I believe we already have unit tests to make sure the weak values caching mechanism is working correctly with garbage collection. So, for the existing caches, I don't think any change needs to be made. That said, this ticket is about caching more types, not necessarily changing the existing caching. In that context, it's important to note that the de-duplication caches really only benefit objects that are frequently duplicated (regardless of cardinality, though types with low-cardinality relative to the frequency of object creation will benefit the most from de-duplication). These typically come from instantiation of the wrapper concrete type when deserializing the canonical representation from user input, ZooKeeper, the metadata table, or similar. Things like TableId and NamespaceId clearly benefit from these. However, not all of these ID types are prone to duplication due to deserialization of their canonical forms. Each one would have to be evaluated to determine if they are subject to frequent duplication, and would benefit from a de-duplication cache, on a case-by-case basis. It's also possible that some of these would benefit from a different kind of cache that might have something like time or size limits, but again, that would require each type to be evaluated on a case-by-case basis in the context of that cache's purpose. If a cache is added, it should be done because it's clear that it would benefit from a the cache being added, whether it's a de-duplication cache to limit memory usage for frequently deserialized IDs with low cardinality, or if it's a cache for some other purpose. Right now, I'm not sure we have any reason to believe any of these other ID types would benefit from additional caching. So, we could probably just close this issue, or somebody could dive in and do some experimentation and analysis to see if the other types would benefit from a cache. I suspect that it's unlikely they will benefit, but have not done the analysis myself, so it's only a suspicion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
