[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

Robert Stupp (JIRA) Sun, 23 Nov 2014 05:46:28 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222365#comment-14222365
 ]


Robert Stupp commented on CASSANDRA-7438:
-----------------------------------------

I've spent some evenings on an alternative approach for an off heap row cache, 
too.
It uses a different concept and architecture.
* Based on a big hash table
* Each hash partition (segment) has a reference to an LRU linked list to hash 
entries. Each "get" operation moves the accessed entry to the head of the LRU 
linked list.
* Data memory is divided into uniform blocks (few kB) and managed by multiple 
(8) free-block linked lists. Just one big memory allocation during 
initialization. Pro: no fragmentation of free memory, easier to handle. Con: 
fragmentation of data.
* Proactive eviction with the goal to keep a percentage of memory free.
* Put operation (currently) fails, if there's not enough memory available to 
store the data. Idea is not to block the calling code ("don't put additional 
latency on an overloaded system")
* Locks (CAS based) exist on each hash partition, each hash entry and each free 
list and are held as short as possible (e.g. "put" allocates data blocks, fills 
these with the data of the new entry, acquires the lock on the hash partition, 
updates the LRU linked list pointers and finishes)
* To keep the linked lists on each hash partition (segment) short, large hash 
tables should be used
* No rehash yet - could be manageable by locking one hash partition at once and 
split it into two new partitions (more logic, but no global lock).
* No overhead in JVM heap for the cache itself (although accesses require short 
lived objects for serialization)
* Only "stolen" thing is Vijay's benchmark (asked him before ;) ).

Pushed here: https://github.com/snazy/ohc - more descriptive Readme, too

Other ideas:
* If we have off heap data, it might be possible to (de)serialize the hot set 
directly to/from that off heap data (zero-copy I/O). At the cost of changing 
the on-disk data format.


> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
>                 Key: CASSANDRA-7438
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Vijay
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-7438.patch
>
>
> Currently SerializingCache is partially off heap, keys are still stored in 
> JVM heap as BB, 
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better 
> results, but this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off 
> heap and use JNI to interact with cache. We might want to ensure that the new 
> implementation match the existing API's (ICache), and the implementation 
> needs to have safe memory access, low overhead in memory and less memcpy's 
> (As much as possible).
> We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

Reply via email to