This is to restart a discussion on a dependency for caching.

Previously,
  http://markmail.org/thread/em2anabal6rj4vjw
which is about guave as a dependency.

Some uses of long term caches:
  TDB (see JENA-801)
  Rules (see JENA-901)
  ARQ
    DatasetGraphCaching, DatasetImpl (stop graph, model churn)
    org.apache.jena.riot.system.IRIResolver
  Fuseki
    SPARQL Query Caching (JENA-626)
  Core
    EnhGraph

JENA-801 reports an experiment using the Guava cache code - we didn't get this as a contribution. I assume the prototype was the code I pointed to which is a straight replacement for the current implementation with guava. That experiment was an improvement but TDB would benefit atomic get-fill pattern support. TDB usage is very dependent on concurrent.

Some current implementations Jena has:
  org.apache.jena.atlas.lib.Cache
  com.hp.hpl.jena.util.cache.Cache
as well as ad-hoc solutions using maps (JENA-901)

None are thread-concurrent-safe for get-fill.

== org.apache.jena.atlas.lib.Cache

Most used is an LRU cache based on LinkedHashMap with a drop handler wrapper. The direct mode TDB disk cache uses drop handlers to flush dirty blocks to disk.

== com.hp.hpl.jena.util.cache.Cache

2 impls:
  RandCache - not used in the codebase; created from it's own tests.
  EnhancedNodeCache - finite sized, slot replacement on clash policy

(this is possibly one of the costs for creating graphs - it's a 1000 slots of memory and Java clears everything first .... IIRC it used to be 5000)

== The Guava interface:
The key operations are:

  V getIfPresent(Object key);
  V get(K key, Callable<? extends V> valueLoader)

The latter "get-and-load-if-absent" is thread-safe atomic. It avoids needing the pattern
  "lock", "get" "if absent put", "unlock"
which over synchronizes to achieve a get and put in a coordinated fashion (this goes beyond JENA-801).

com.google.common.cache.Cache is:

public interface Cache<K, V>
  V getIfPresent(Object key);
  V get(K key, Callable<? extends V> valueLoader)
  ImmutableMap<K, V> getAllPresent(Iterable<?> keys);
  void put(K key, V value);
  void putAll(Map<? extends K,? extends V> m);
  void invalidate(Object key);
  void invalidateAll(Iterable<?> keys);
  void invalidateAll();
  long size();
  CacheStats stats();
  ConcurrentMap<K, V> asMap();
  void cleanUp();

and a CacheBuilder.

== Other

=== Commons JCS

This does have a memory-backed cache; it does not support a lock-efficient "get-load" pattern. It does provide some degree of thread-parallel access but it is a cache-wide lock. May be it's possible to use multiple regions to shard the cache and get some truly parallel access but it looks like it needs programming on top of the JCS artifact. The design centre of JCS seems to be around larger, higher-cost objects than NodeId/Nodes let alone small memory computation caches.

===

What others, with right license and right design focus? There are a lot of cache packages out there but the features of Guava look right to me. Does anyone have experience of using for real any other cache package?

Reply via email to