[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

Ariel Weisberg (JIRA) Mon, 12 Jan 2015 15:40:14 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274409#comment-14274409
 ]


Ariel Weisberg commented on CASSANDRA-7438:
-------------------------------------------

I did another review. The additional test coverage looks great.

Don’t throw Error, throw runtime exceptions on things like serialization 
issues. The only place it make sense to throw error is when allocating memory 
fails. That would match the behavior of ByteBuffer.allocateDirect. I don’t see 
failure to allocate from the heap allocator as recoverable even in the context 
of the cache. IOError is thrown from one place in the entire JDK (Console) so 
it's an odd choice.

freeCapacity should reall be a field inside each segment and full/not full and 
eviction decisions should be made inside each segment independently. In 
practice inside C* it’s probably fine as just an AtomicLong, but I want to see 
OHC be all it can be.

Rehash test could validate the data. After the rehash. It could also validate 
the rehash under concurrent access, say have a  reader thread that is randomly 
accessing values < the already inserted value.I can’t tell if the crosscheck 
test inserts enough values to trigger rehashing.

Inlining the murmur3 changes makes me a little uncomfortable. It’s good see see 
some test coverage comparing with another implementation, but it’s over a small 
set of data. It seems like the Unsigned stuff necessary to perfectly mimic the 
native version of murmur3 is missing?

Add 2-4 byte coed points for the UTF-8 tests.

FastUtil is a 17 megabyte dependency all to get one array list.

The cross checking implementation is really nice.

Looking at the AbstractKeyIterator, I don’t see how it can do the right thing 
when a segment rehashes. It will point to a random spot in the segment after a 
rehash right? In practice maybe this doesn’t matter since they should size up 
promptly and it’s just an optimization that we dump this stuff at all. I can 
understand what the current code does so I lean towards keeping it.

There are a couple of places (serializeForPut, putInternal, maybe others) where 
there are two exception handlers that each de-allocate the same piece of 
memory. The deallocation could go in a finally instead of the exception 
handlers since it always happens.


> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
>                 Key: CASSANDRA-7438
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Robert Stupp
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-7438.patch, tests.zip
>
>
> Currently SerializingCache is partially off heap, keys are still stored in 
> JVM heap as BB, 
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better 
> results, but this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off 
> heap and use JNI to interact with cache. We might want to ensure that the new 
> implementation match the existing API's (ICache), and the implementation 
> needs to have safe memory access, low overhead in memory and less memcpy's 
> (As much as possible).
> We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

Reply via email to