Agree that Guava probably should be shadowed, as they often release new major versions.
A proper caching structure is also needed to fix https://issues.apache.org/jira/browse/JENA-901 - so a +1 to try Guava from me. On 19 March 2015 at 18:23, Stephen Allen <[email protected]> wrote: > On Thu, Mar 19, 2015 at 1:37 PM, Andy Seaborne <[email protected]> wrote: > >> This is to restart a discussion on a dependency for caching. >> >> Previously, >> http://markmail.org/thread/em2anabal6rj4vjw >> which is about guave as a dependency. >> >> Some uses of long term caches: >> TDB (see JENA-801) >> Rules (see JENA-901) >> ARQ >> DatasetGraphCaching, DatasetImpl (stop graph, model churn) >> org.apache.jena.riot.system.IRIResolver >> Fuseki >> SPARQL Query Caching (JENA-626) >> Core >> EnhGraph >> >> JENA-801 reports an experiment using the Guava cache code - we didn't get >> this as a contribution. I assume the prototype was the code I pointed to >> which is a straight replacement for the current implementation with guava. >> That experiment was an improvement but TDB would benefit atomic get-fill >> pattern support. TDB usage is very dependent on concurrent. >> >> Some current implementations Jena has: >> org.apache.jena.atlas.lib.Cache >> com.hp.hpl.jena.util.cache.Cache >> as well as ad-hoc solutions using maps (JENA-901) >> >> None are thread-concurrent-safe for get-fill. >> >> == org.apache.jena.atlas.lib.Cache >> >> Most used is an LRU cache based on LinkedHashMap with a drop handler >> wrapper. The direct mode TDB disk cache uses drop handlers to flush dirty >> blocks to disk. >> >> == com.hp.hpl.jena.util.cache.Cache >> >> 2 impls: >> RandCache - not used in the codebase; created from it's own tests. >> EnhancedNodeCache - finite sized, slot replacement on clash policy >> >> (this is possibly one of the costs for creating graphs - it's a 1000 slots >> of memory and Java clears everything first .... IIRC it used to be 5000) >> >> == The Guava interface: >> The key operations are: >> >> V getIfPresent(Object key); >> V get(K key, Callable<? extends V> valueLoader) >> >> The latter "get-and-load-if-absent" is thread-safe atomic. It avoids >> needing the pattern >> "lock", "get" "if absent put", "unlock" >> which over synchronizes to achieve a get and put in a coordinated fashion >> (this goes beyond JENA-801). >> >> com.google.common.cache.Cache is: >> >> public interface Cache<K, V> >> V getIfPresent(Object key); >> V get(K key, Callable<? extends V> valueLoader) >> ImmutableMap<K, V> getAllPresent(Iterable<?> keys); >> void put(K key, V value); >> void putAll(Map<? extends K,? extends V> m); >> void invalidate(Object key); >> void invalidateAll(Iterable<?> keys); >> void invalidateAll(); >> long size(); >> CacheStats stats(); >> ConcurrentMap<K, V> asMap(); >> void cleanUp(); >> >> and a CacheBuilder. >> >> == Other >> >> === Commons JCS >> >> This does have a memory-backed cache; it does not support a lock-efficient >> "get-load" pattern. It does provide some degree of thread-parallel access >> but it is a cache-wide lock. May be it's possible to use multiple regions >> to shard the cache and get some truly parallel access but it looks like it >> needs programming on top of the JCS artifact. The design centre of JCS >> seems to be around larger, higher-cost objects than NodeId/Nodes let alone >> small memory computation caches. >> >> === >> >> What others, with right license and right design focus? There are a lot >> of cache packages out there but the features of Guava look right to me. >> Does anyone have experience of using for real any other cache package? >> > > > I've used the Guava cache before and I really like the design. > > We obviously may have some issues with it being a dependency as was > discussed in the email thread you linked. For my usage, I would have no > problem keeping my application up to date with the latest Guava (as long as > Jena keeps up to date). But maybe we would need to shade it so other > people won't get conflicting versions. Presumably when/if Jigsaw arrives > we won't have to worry about this problem any more. > > If we do decide to make Guava a dependency, it would be nice (although a > big change) to take a pass through and eliminate utility methods/classes > that we have that are provided by Guava (I'm thinking of > org.apache.jena.atlas.iterator.Iter specifically). > > -Stephen -- Stian Soiland-Reyes Apache Taverna (incubating), Apache Commons RDF (incubating) http://orcid.org/0000-0001-9842-9718
