Agree that Guava probably should be shadowed, as they often release
new major versions.

A proper caching structure is also needed to fix
https://issues.apache.org/jira/browse/JENA-901 - so a +1 to try Guava
from me.

On 19 March 2015 at 18:23, Stephen Allen <[email protected]> wrote:
> On Thu, Mar 19, 2015 at 1:37 PM, Andy Seaborne <[email protected]> wrote:
>
>> This is to restart a discussion on a dependency for caching.
>>
>> Previously,
>>   http://markmail.org/thread/em2anabal6rj4vjw
>> which is about guave as a dependency.
>>
>> Some uses of long term caches:
>>   TDB (see JENA-801)
>>   Rules (see JENA-901)
>>   ARQ
>>     DatasetGraphCaching, DatasetImpl (stop graph, model churn)
>>     org.apache.jena.riot.system.IRIResolver
>>   Fuseki
>>     SPARQL Query Caching (JENA-626)
>>   Core
>>     EnhGraph
>>
>> JENA-801 reports an experiment using the Guava cache code - we didn't get
>> this as a contribution. I assume the prototype was the code I pointed to
>> which is a straight replacement for the current implementation with guava.
>> That experiment was an improvement but TDB would benefit atomic get-fill
>> pattern support.  TDB usage is very dependent on concurrent.
>>
>> Some current implementations Jena has:
>>   org.apache.jena.atlas.lib.Cache
>>   com.hp.hpl.jena.util.cache.Cache
>> as well as ad-hoc solutions using maps (JENA-901)
>>
>> None are thread-concurrent-safe for get-fill.
>>
>> == org.apache.jena.atlas.lib.Cache
>>
>> Most used is an LRU cache based on LinkedHashMap with a drop handler
>> wrapper.  The direct mode TDB disk cache uses drop handlers to flush dirty
>> blocks to disk.
>>
>> == com.hp.hpl.jena.util.cache.Cache
>>
>> 2 impls:
>>   RandCache - not used in the codebase; created from it's own tests.
>>   EnhancedNodeCache - finite sized, slot replacement on clash policy
>>
>> (this is possibly one of the costs for creating graphs - it's a 1000 slots
>> of memory and Java clears everything first .... IIRC it used to be 5000)
>>
>> == The Guava interface:
>> The key operations are:
>>
>>   V getIfPresent(Object key);
>>   V get(K key, Callable<? extends V> valueLoader)
>>
>> The latter "get-and-load-if-absent" is thread-safe atomic.  It avoids
>> needing the pattern
>>   "lock", "get" "if absent put", "unlock"
>> which over synchronizes to achieve a get and put in a coordinated fashion
>> (this goes beyond JENA-801).
>>
>> com.google.common.cache.Cache is:
>>
>> public interface Cache<K, V>
>>   V getIfPresent(Object key);
>>   V get(K key, Callable<? extends V> valueLoader)
>>   ImmutableMap<K, V> getAllPresent(Iterable<?> keys);
>>   void put(K key, V value);
>>   void putAll(Map<? extends K,? extends V> m);
>>   void invalidate(Object key);
>>   void invalidateAll(Iterable<?> keys);
>>   void invalidateAll();
>>   long size();
>>   CacheStats stats();
>>   ConcurrentMap<K, V> asMap();
>>   void cleanUp();
>>
>> and a CacheBuilder.
>>
>> == Other
>>
>> === Commons JCS
>>
>> This does have a memory-backed cache; it does not support a lock-efficient
>> "get-load" pattern.   It does provide some degree of thread-parallel access
>> but it is a cache-wide lock.  May be it's possible to use multiple regions
>> to shard the cache and get some truly parallel access but it looks like it
>> needs programming on top of the JCS artifact.  The design centre of JCS
>> seems to be around larger, higher-cost objects than NodeId/Nodes let alone
>> small memory computation caches.
>>
>> ===
>>
>> What others, with right license and right design focus?  There are a lot
>> of cache packages out there but the features of Guava look right to me.
>> Does anyone have experience of using for real any other cache package?
>>
>
>
> I've used the Guava cache before and I really like the design.
>
> We obviously may have some issues with it being a dependency as was
> discussed in the email thread you linked.  For my usage, I would have no
> problem keeping my application up to date with the latest Guava (as long as
> Jena keeps up to date).  But maybe we would need to shade it so other
> people won't get conflicting versions.  Presumably when/if Jigsaw arrives
> we won't have to worry about this problem any more.
>
> If we do decide to make Guava a dependency, it would be nice (although a
> big change) to take a pass through and eliminate utility methods/classes
> that we have that are provided by Guava (I'm thinking of
> org.apache.jena.atlas.iterator.Iter specifically).
>
> -Stephen



-- 
Stian Soiland-Reyes
Apache Taverna (incubating), Apache Commons RDF (incubating)
http://orcid.org/0000-0001-9842-9718

Reply via email to