[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251537#comment-14251537
]
Robert Stupp commented on CASSANDRA-7438:
-----------------------------------------
I’ve nearly finished the OHC implementation. Unit tests cover all functionality
required by C* and a separate test-only implementation is now used to verify
the implementation (entry (de)serialization is not extensively covered by the
tests, yet). OHC interface is changed towards the functionality required by C*.
Maven executes the unit tests both with and without jemalloc (only if jemalloc
is installed, of course).
[~aweisberg], [~benedict] can you have a look at the current OHC code?
I’d like to know how it could/should be integrated in C*. IMO there are two
decisions to be made:
* Whether to migrate whole OHC code into org.apache.cassandra codebase (with
the option to either turn it on or off).
* Whether to implement a “pluggable row cache“ (to allow multiple
implementations)
I've got some ideas regarding row cache which are out of scope of this ticket:
* New per-table knob to enable whether to populate entries to the row cache on
reads+writes or just on reads (to target different workloads)
* Rethink about whether to keep the current {{RowCacheSentinel}} implementation
as is - if I understand it correctly, it just reduces the number of cache-put
operations (cache hit on a sentinel performs a disk read). A compromise
regarding additional serialization cost?
* Improvement of key (de)serialization (saving the row cache to disk) - use
direct I/O
* Optimizations of value deserialization effort - let C* directly access a
cached row in off-heap memory instead of the deserialization (and on-heap
object construction) overhead.
Note: although the jemalloc allocator provides a {{getTotalAllocated()}}
method, the result is not correct and I don't know why. The result depends on
jemalloc configure settings ({{--en/disable-tcache}}). According to the
man-page the result should be correct (sum of {{stats.allocated}} and
{{stats.huge.allocated}}), but it isn't (verified with a "coded memory leak of
small allocations" that didn't increase the value). Iterating over the jemalloc
_arenas_ and _bins_ does not help since the two mentioned values are
aggregations of these.
> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
> Key: CASSANDRA-7438
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Environment: Linux
> Reporter: Vijay
> Assignee: Vijay
> Labels: performance
> Fix For: 3.0
>
> Attachments: 0001-CASSANDRA-7438.patch, tests.zip
>
>
> Currently SerializingCache is partially off heap, keys are still stored in
> JVM heap as BB,
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better
> results, but this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off
> heap and use JNI to interact with cache. We might want to ensure that the new
> implementation match the existing API's (ICache), and the implementation
> needs to have safe memory access, low overhead in memory and less memcpy's
> (As much as possible).
> We might also want to make this cache configurable.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)