Caching and cache packages.

Andy Seaborne Thu, 19 Mar 2015 10:39:13 -0700

This is to restart a discussion on a dependency for caching.


Previously,
  http://markmail.org/thread/em2anabal6rj4vjw
which is about guave as a dependency.

Some uses of long term caches:
  TDB (see JENA-801)
  Rules (see JENA-901)
  ARQ
    DatasetGraphCaching, DatasetImpl (stop graph, model churn)
    org.apache.jena.riot.system.IRIResolver
  Fuseki
    SPARQL Query Caching (JENA-626)
  Core
    EnhGraph

JENA-801 reports an experiment using the Guava cache code - we didn'tget this as a contribution. I assume the prototype was the code Ipointed to which is a straight replacement for the currentimplementation with guava. That experiment was an improvement but TDBwould benefit atomic get-fill pattern support. TDB usage is verydependent on concurrent.


Some current implementations Jena has:
  org.apache.jena.atlas.lib.Cache
  com.hp.hpl.jena.util.cache.Cache
as well as ad-hoc solutions using maps (JENA-901)

None are thread-concurrent-safe for get-fill.

== org.apache.jena.atlas.lib.Cache

Most used is an LRU cache based on LinkedHashMap with a drop handlerwrapper. The direct mode TDB disk cache uses drop handlers to flushdirty blocks to disk.


== com.hp.hpl.jena.util.cache.Cache

2 impls:
  RandCache - not used in the codebase; created from it's own tests.
  EnhancedNodeCache - finite sized, slot replacement on clash policy

(this is possibly one of the costs for creating graphs - it's a 1000slots of memory and Java clears everything first .... IIRC it used to be5000)


== The Guava interface:
The key operations are:

  V getIfPresent(Object key);
  V get(K key, Callable<? extends V> valueLoader)

The latter "get-and-load-if-absent" is thread-safe atomic. It avoidsneeding the pattern

  "lock", "get" "if absent put", "unlock"

which over synchronizes to achieve a get and put in a coordinatedfashion (this goes beyond JENA-801).


com.google.common.cache.Cache is:

public interface Cache<K, V>
  V getIfPresent(Object key);
  V get(K key, Callable<? extends V> valueLoader)
  ImmutableMap<K, V> getAllPresent(Iterable<?> keys);
  void put(K key, V value);
  void putAll(Map<? extends K,? extends V> m);
  void invalidate(Object key);
  void invalidateAll(Iterable<?> keys);
  void invalidateAll();
  long size();
  CacheStats stats();
  ConcurrentMap<K, V> asMap();
  void cleanUp();

and a CacheBuilder.

== Other

=== Commons JCS

This does have a memory-backed cache; it does not support alock-efficient "get-load" pattern. It does provide some degree ofthread-parallel access but it is a cache-wide lock. May be it'spossible to use multiple regions to shard the cache and get some trulyparallel access but it looks like it needs programming on top of the JCSartifact. The design centre of JCS seems to be around larger,higher-cost objects than NodeId/Nodes let alone small memory computationcaches.

===

What others, with right license and right design focus? There are a lotof cache packages out there but the features of Guava look right to me.Does anyone have experience of using for real any other cache package?

Caching and cache packages.

Reply via email to