[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229284#comment-14229284
 ] 

Robert Stupp commented on CASSANDRA-7438:
-----------------------------------------

Have pushed the latest changes of OHC to https://github.com/snazy/ohc. It has 
been nearly completely rewritten.

Architecture (in brief):
* OHC consists of multiple segments (default: 2 x #CPUs). Less segments leads 
to more contention, more segments gives no measurable improvement.
* Each segment consists of an off-heap-hash-map (defaults: table-size=8192, 
load-factor=.75). (The hash table requires 8 bytes per bucket)
* Hash entries in a bucket are organized in a double-linked-list
* LRU replacement policy is built-in via its own double-linked-list
* Critical sections that mutually lock a segment are pretty short (code + CPU) 
- just a 'synchronized' keyword, no StampedLock/ReentrantLock
* Capacity for the cache is configured globally and managed "locally" in each 
segment
* Eviction (or "replacement" or "cleanup") is triggered when free capacity goes 
below a trigger value and cleans up to a target free capacity
* Uses murmur hash on serialized key. Most significant bits are used to find 
the segment, least significant bits for the segment's hash map. 

Non-production relevant stuff:
* Allows to start off-heap access in "debug" mode, that checks for accesses 
outside of allocated region and produces exceptions instead of SIGSEGV or 
jemalloc errors
* ohc-benchmark updated to reflect changes

About replacement policy: Currently LRU is built in - but I'm not really sold 
on LRU as is. Alternatives could be
* timestamp (not sold on this either - basically the same as LRU)
* LIRS (https://en.wikipedia.org/wiki/LIRS_caching_algorithm), big overhead 
(space)
* 2Q (counts accesses, divides counter regularly)
* LRU+random (50/50) (may give the same result than LIRS, but without LIRS' 
overhead)
But replacement of LRU with something else is out of scope of this ticket and 
should be done with real workloads in C* - although the last one is "just" a 
additional config parameter.

IMO we should add a per-table option that configures whether the row cache 
receives data on reads+writes or just on reads. Might prevent garbage in the 
cache caused by write heavy tables.

{{Unsafe.allocateMemory()}} gives about 5-10% performance improvement compared 
to jemalloc. Reason fot it might be that JNA library (which has some 
synchronized blocks in it).

IMO OHC is ready to be merged into C* code base.


> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
>                 Key: CASSANDRA-7438
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Vijay
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-7438.patch, tests.zip
>
>
> Currently SerializingCache is partially off heap, keys are still stored in 
> JVM heap as BB, 
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better 
> results, but this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off 
> heap and use JNI to interact with cache. We might want to ensure that the new 
> implementation match the existing API's (ICache), and the implementation 
> needs to have safe memory access, low overhead in memory and less memcpy's 
> (As much as possible).
> We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to