On Thu, Mar 19, 2015 at 1:37 PM, Andy Seaborne <[email protected]> wrote:

> This is to restart a discussion on a dependency for caching.
>
> Previously,
>   http://markmail.org/thread/em2anabal6rj4vjw
> which is about guave as a dependency.
>
> Some uses of long term caches:
>   TDB (see JENA-801)
>   Rules (see JENA-901)
>   ARQ
>     DatasetGraphCaching, DatasetImpl (stop graph, model churn)
>     org.apache.jena.riot.system.IRIResolver
>   Fuseki
>     SPARQL Query Caching (JENA-626)
>   Core
>     EnhGraph
>
> JENA-801 reports an experiment using the Guava cache code - we didn't get
> this as a contribution. I assume the prototype was the code I pointed to
> which is a straight replacement for the current implementation with guava.
> That experiment was an improvement but TDB would benefit atomic get-fill
> pattern support.  TDB usage is very dependent on concurrent.
>
> Some current implementations Jena has:
>   org.apache.jena.atlas.lib.Cache
>   com.hp.hpl.jena.util.cache.Cache
> as well as ad-hoc solutions using maps (JENA-901)
>
> None are thread-concurrent-safe for get-fill.
>
> == org.apache.jena.atlas.lib.Cache
>
> Most used is an LRU cache based on LinkedHashMap with a drop handler
> wrapper.  The direct mode TDB disk cache uses drop handlers to flush dirty
> blocks to disk.
>
> == com.hp.hpl.jena.util.cache.Cache
>
> 2 impls:
>   RandCache - not used in the codebase; created from it's own tests.
>   EnhancedNodeCache - finite sized, slot replacement on clash policy
>
> (this is possibly one of the costs for creating graphs - it's a 1000 slots
> of memory and Java clears everything first .... IIRC it used to be 5000)
>
> == The Guava interface:
> The key operations are:
>
>   V getIfPresent(Object key);
>   V get(K key, Callable<? extends V> valueLoader)
>
> The latter "get-and-load-if-absent" is thread-safe atomic.  It avoids
> needing the pattern
>   "lock", "get" "if absent put", "unlock"
> which over synchronizes to achieve a get and put in a coordinated fashion
> (this goes beyond JENA-801).
>
> com.google.common.cache.Cache is:
>
> public interface Cache<K, V>
>   V getIfPresent(Object key);
>   V get(K key, Callable<? extends V> valueLoader)
>   ImmutableMap<K, V> getAllPresent(Iterable<?> keys);
>   void put(K key, V value);
>   void putAll(Map<? extends K,? extends V> m);
>   void invalidate(Object key);
>   void invalidateAll(Iterable<?> keys);
>   void invalidateAll();
>   long size();
>   CacheStats stats();
>   ConcurrentMap<K, V> asMap();
>   void cleanUp();
>
> and a CacheBuilder.
>
> == Other
>
> === Commons JCS
>
> This does have a memory-backed cache; it does not support a lock-efficient
> "get-load" pattern.   It does provide some degree of thread-parallel access
> but it is a cache-wide lock.  May be it's possible to use multiple regions
> to shard the cache and get some truly parallel access but it looks like it
> needs programming on top of the JCS artifact.  The design centre of JCS
> seems to be around larger, higher-cost objects than NodeId/Nodes let alone
> small memory computation caches.
>
> ===
>
> What others, with right license and right design focus?  There are a lot
> of cache packages out there but the features of Guava look right to me.
> Does anyone have experience of using for real any other cache package?
>


I've used the Guava cache before and I really like the design.

We obviously may have some issues with it being a dependency as was
discussed in the email thread you linked.  For my usage, I would have no
problem keeping my application up to date with the latest Guava (as long as
Jena keeps up to date).  But maybe we would need to shade it so other
people won't get conflicting versions.  Presumably when/if Jigsaw arrives
we won't have to worry about this problem any more.

If we do decide to make Guava a dependency, it would be nice (although a
big change) to take a pass through and eliminate utility methods/classes
that we have that are provided by Guava (I'm thinking of
org.apache.jena.atlas.iterator.Iter specifically).

-Stephen

Reply via email to