[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

Ariel Weisberg (JIRA) Tue, 25 Nov 2014 15:20:35 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225371#comment-14225371
 ]


Ariel Weisberg commented on CASSANDRA-7438:
-------------------------------------------

bq. Are you suggesting a configurable logging/exception handling in case the 2 
threads throw exceptions? If yes sure. Other exceptions AFAIK are already 
propagated. (Still needs cleanup though).
Something has to happen to exceptions generated there. Since it is a library 
and there is no caller to propagate them to it implies that people need to 
provide a listener or a logger.

bq. Segments are small in memory so far in my tests, 
Segments are hash buckets correct? They aren't segments of several hash 
buckets. If the goal of the hash table is to have at most two or three entries 
per segment then having an on heap Java object would be a lot of overhead per. 
Just as a guess we are talking about two objects. There is the 
Segment/ReentrantLock and then the AbstractQueuedSynchronizer allocated by 
ReentrantLock which has three additional fields. It's 48 bytes without 
alignment or object headers. There is also the overhead of having an 
AtomicArray of pointers to each segment object. A hash table bucket only has to 
be a pointer plus a lock field if you are going to lock buckets. You could do 
that in 8-12 bytes.

Whether it's too much data on heap is a question of how big a cache you want 
and how small the values being cached are. The smaller the values being cached 
the more the metadata overhead of the cache (and the JVM overhead) matter.

Locking wise if you are only doing spin locks you can use unsafe compare and 
swap to implement a lock in off heap memory. You do have to be careful about 
alignment.

bq. Nope it is Single threaded Executor and the rehash boolean is already 
volatile. Next commit will have conditions instead (similar to C 
implementation).
The task submitted to the executor doesn't check whether another rehash is 
required it just does it. The check before submitting a task to do rehashing 
appears to have a race where two threads could submit the task at the same 
time. There is no isolation between the threads as they read the volatile field 
and then write to it. Two or more threads could read and see that no rehash is 
in progress, update the value to rehash in progress, and then submit the task.

> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
>                 Key: CASSANDRA-7438
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Vijay
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-7438.patch
>
>
> Currently SerializingCache is partially off heap, keys are still stored in 
> JVM heap as BB, 
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better 
> results, but this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off 
> heap and use JNI to interact with cache. We might want to ensure that the new 
> implementation match the existing API's (ICache), and the implementation 
> needs to have safe memory access, low overhead in memory and less memcpy's 
> (As much as possible).
> We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

Reply via email to