This is to restart a discussion on a dependency for caching.
Previously,
http://markmail.org/thread/em2anabal6rj4vjw
which is about guave as a dependency.
Some uses of long term caches:
TDB (see JENA-801)
Rules (see JENA-901)
ARQ
DatasetGraphCaching, DatasetImpl (stop graph, model churn)
org.apache.jena.riot.system.IRIResolver
Fuseki
SPARQL Query Caching (JENA-626)
Core
EnhGraph
JENA-801 reports an experiment using the Guava cache code - we didn't
get this as a contribution. I assume the prototype was the code I
pointed to which is a straight replacement for the current
implementation with guava. That experiment was an improvement but TDB
would benefit atomic get-fill pattern support. TDB usage is very
dependent on concurrent.
Some current implementations Jena has:
org.apache.jena.atlas.lib.Cache
com.hp.hpl.jena.util.cache.Cache
as well as ad-hoc solutions using maps (JENA-901)
None are thread-concurrent-safe for get-fill.
== org.apache.jena.atlas.lib.Cache
Most used is an LRU cache based on LinkedHashMap with a drop handler
wrapper. The direct mode TDB disk cache uses drop handlers to flush
dirty blocks to disk.
== com.hp.hpl.jena.util.cache.Cache
2 impls:
RandCache - not used in the codebase; created from it's own tests.
EnhancedNodeCache - finite sized, slot replacement on clash policy
(this is possibly one of the costs for creating graphs - it's a 1000
slots of memory and Java clears everything first .... IIRC it used to be
5000)
== The Guava interface:
The key operations are:
V getIfPresent(Object key);
V get(K key, Callable<? extends V> valueLoader)
The latter "get-and-load-if-absent" is thread-safe atomic. It avoids
needing the pattern
"lock", "get" "if absent put", "unlock"
which over synchronizes to achieve a get and put in a coordinated
fashion (this goes beyond JENA-801).
com.google.common.cache.Cache is:
public interface Cache<K, V>
V getIfPresent(Object key);
V get(K key, Callable<? extends V> valueLoader)
ImmutableMap<K, V> getAllPresent(Iterable<?> keys);
void put(K key, V value);
void putAll(Map<? extends K,? extends V> m);
void invalidate(Object key);
void invalidateAll(Iterable<?> keys);
void invalidateAll();
long size();
CacheStats stats();
ConcurrentMap<K, V> asMap();
void cleanUp();
and a CacheBuilder.
== Other
=== Commons JCS
This does have a memory-backed cache; it does not support a
lock-efficient "get-load" pattern. It does provide some degree of
thread-parallel access but it is a cache-wide lock. May be it's
possible to use multiple regions to shard the cache and get some truly
parallel access but it looks like it needs programming on top of the JCS
artifact. The design centre of JCS seems to be around larger,
higher-cost objects than NodeId/Nodes let alone small memory computation
caches.
===
What others, with right license and right design focus? There are a lot
of cache packages out there but the features of Guava look right to me.
Does anyone have experience of using for real any other cache package?