[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

Robert Stupp (JIRA) Thu, 18 Dec 2014 03:31:39 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251537#comment-14251537
 ]


Robert Stupp commented on CASSANDRA-7438:
-----------------------------------------

I’ve nearly finished the OHC implementation. Unit tests cover all functionality 
required by C* and a separate test-only implementation is now used to verify 
the implementation (entry (de)serialization is not extensively covered by the 
tests, yet). OHC interface is changed towards the functionality required by C*.

Maven executes the unit tests both with and without jemalloc (only if jemalloc 
is installed, of course).

[~aweisberg], [~benedict] can you have a look at the current OHC code?

I’d like to know how it could/should be integrated in C*. IMO there are two 
decisions to be made:
* Whether to migrate whole OHC code into org.apache.cassandra codebase (with 
the option to either turn it on or off).
* Whether to implement a “pluggable row cache“ (to allow multiple 
implementations)

I've got some ideas regarding row cache which are out of scope of this ticket:
* New per-table knob to enable whether to populate entries to the row cache on 
reads+writes or just on reads (to target different workloads)
* Rethink about whether to keep the current {{RowCacheSentinel}} implementation 
as is - if I understand it correctly, it just reduces the number of cache-put 
operations (cache hit on a sentinel performs a disk read). A compromise 
regarding additional serialization cost?
* Improvement of key (de)serialization (saving the row cache to disk) - use 
direct I/O
* Optimizations of value deserialization effort - let C* directly access a 
cached row in off-heap memory instead of the deserialization (and on-heap 
object construction) overhead.

Note: although the jemalloc allocator provides a {{getTotalAllocated()}} 
method, the result is not correct and I don't know why. The result depends on 
jemalloc configure settings ({{--en/disable-tcache}}). According to the 
man-page the result should be correct (sum of {{stats.allocated}} and 
{{stats.huge.allocated}}), but it isn't (verified with a "coded memory leak of 
small allocations" that didn't increase the value). Iterating over the jemalloc 
_arenas_ and _bins_ does not help since the two mentioned values are 
aggregations of these.


> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
>                 Key: CASSANDRA-7438
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Vijay
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-7438.patch, tests.zip
>
>
> Currently SerializingCache is partially off heap, keys are still stored in 
> JVM heap as BB, 
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better 
> results, but this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off 
> heap and use JNI to interact with cache. We might want to ensure that the new 
> implementation match the existing API's (ICache), and the implementation 
> needs to have safe memory access, low overhead in memory and less memcpy's 
> (As much as possible).
> We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

Reply via email to