[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225371#comment-14225371
]
Ariel Weisberg commented on CASSANDRA-7438:
-------------------------------------------
bq. Are you suggesting a configurable logging/exception handling in case the 2
threads throw exceptions? If yes sure. Other exceptions AFAIK are already
propagated. (Still needs cleanup though).
Something has to happen to exceptions generated there. Since it is a library
and there is no caller to propagate them to it implies that people need to
provide a listener or a logger.
bq. Segments are small in memory so far in my tests,
Segments are hash buckets correct? They aren't segments of several hash
buckets. If the goal of the hash table is to have at most two or three entries
per segment then having an on heap Java object would be a lot of overhead per.
Just as a guess we are talking about two objects. There is the
Segment/ReentrantLock and then the AbstractQueuedSynchronizer allocated by
ReentrantLock which has three additional fields. It's 48 bytes without
alignment or object headers. There is also the overhead of having an
AtomicArray of pointers to each segment object. A hash table bucket only has to
be a pointer plus a lock field if you are going to lock buckets. You could do
that in 8-12 bytes.
Whether it's too much data on heap is a question of how big a cache you want
and how small the values being cached are. The smaller the values being cached
the more the metadata overhead of the cache (and the JVM overhead) matter.
Locking wise if you are only doing spin locks you can use unsafe compare and
swap to implement a lock in off heap memory. You do have to be careful about
alignment.
bq. Nope it is Single threaded Executor and the rehash boolean is already
volatile. Next commit will have conditions instead (similar to C
implementation).
The task submitted to the executor doesn't check whether another rehash is
required it just does it. The check before submitting a task to do rehashing
appears to have a race where two threads could submit the task at the same
time. There is no isolation between the threads as they read the volatile field
and then write to it. Two or more threads could read and see that no rehash is
in progress, update the value to rehash in progress, and then submit the task.
> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
> Key: CASSANDRA-7438
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Environment: Linux
> Reporter: Vijay
> Assignee: Vijay
> Labels: performance
> Fix For: 3.0
>
> Attachments: 0001-CASSANDRA-7438.patch
>
>
> Currently SerializingCache is partially off heap, keys are still stored in
> JVM heap as BB,
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better
> results, but this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off
> heap and use JNI to interact with cache. We might want to ensure that the new
> implementation match the existing API's (ICache), and the implementation
> needs to have safe memory access, low overhead in memory and less memcpy's
> (As much as possible).
> We might also want to make this cache configurable.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)