[ 
https://issues.apache.org/jira/browse/ACCUMULO-4626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997169#comment-15997169
 ] 

Ben Manes commented on ACCUMULO-4626:
-------------------------------------

I would guess that a distributed cache wouldn't offer many benefits, if we can 
assume that data consistency is handled by the database already (since that is 
a fundamental need). The replication, remote calls, etc. sound like overhead. 
Most of the distributed caches have slow in-memory caches due to network and 
serialization being the primary bottlenecks. Perhaps integrating Cassandra's 
off-heap cache would be a better / easier option, since it was extracted into a 
reusable library ([ohc|https://github.com/snazy/ohc]).

If caches were pluggable then a JCache adapter might be possible. Its not great 
standard, but it would allow for using any implementor (most of which are 
open-source). [~kturner] was pondering a pluggable layer in the TinyLFU patch.

> improve cache hit rate via weak reference map
> ---------------------------------------------
>
>                 Key: ACCUMULO-4626
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4626
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: tserver
>            Reporter: Adam Fuchs
>              Labels: performance, stability
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> When a single iterator tree references the same RFile blocks in different 
> branches we sometimes get cache misses for one iterator even though the 
> requested block is held in memory by another iterator. This is particularly 
> important when using something like the IntersectingIterator to intersect 
> many deep copies. Instead of evicting completely, keeping evicted blocks into 
> a WeakReference value map can avoid re-reading blocks that are currently 
> referenced by another deep copied source iterator.
> We've seen this in the field for some of Sqrrl's queries against very large 
> tablets. The total memory usage for these queries can be equal to the size of 
> all the iterator block reads times the number of readahead threads times the 
> number of files times the number of IntersectingIterator children when cache 
> miss rates are high. This might work out to something like:
> {code}
> 16 readahead threads * 200 deep copied children * 99% cache miss rate * 20 
> files * 252KB per reader = ~16GB of memory
> {code}
> In most cases, evicting to a weak reference value map changes the cache miss 
> rate from very high to very low and has a dramatic effect on total memory 
> usage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to