[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14262472#comment-14262472
 ] 

Ariel Weisberg commented on CASSANDRA-7438:
-------------------------------------------

I have an in progress response to your earlier comment. I'll address the 
benchmark here.
 
I wouldn't sweat allocator performance. Ultimately we will have to have our own 
if only to accurately enforce memory utilization (user asks for 200 megabytes, 
we use 400, not cool). I think the blueprint for how to do this already exists 
in something like memcached in terms of how to allocate and defragment. We just 
need to adapt it for our approach where it is a pool of independently locked 
hash tables.

The overhead of copying is where zero deserialization and ref-counting start to 
be a win since you don't have to copy at all. I wouldn't get worked up on 
optimizing for that yet since that requires upstream to be smarter about how it 
uses the cache. If upstream can parse the cache value and extract a subset 
without copying the entire thing it will handle larger values more gracefully. 
At some point upstream might also hold partial rows as well.

I would like to see the ability to spin all cores against the cache, at least 
for relatively small values. Not being able to do that is a little concerning. 
Are threads blocking inside the allocator? Do the utilization issues occur with 
large or small values?

I don't have a real baseline with whether these numbers are good or bad. They 
sound okay and as you say you would expect the allocator to be one of the 
slowest parts. I am not sure testing with 500 threads is realistic since 
threads have a pretty good chance of being descheduled while holding a lock and 
that isn't as likely to happen under real usage conditions. I would test with 
say 30 threads on that hardware. 

For say 16k values measuring scaling from 1-30 threads would give us an idea of 
how well things are going. That would also give you better feedback on whether 
different numbers of stripes help or not.

> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
>                 Key: CASSANDRA-7438
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Vijay
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-7438.patch, tests.zip
>
>
> Currently SerializingCache is partially off heap, keys are still stored in 
> JVM heap as BB, 
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better 
> results, but this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off 
> heap and use JNI to interact with cache. We might want to ensure that the new 
> implementation match the existing API's (ICache), and the implementation 
> needs to have safe memory access, low overhead in memory and less memcpy's 
> (As much as possible).
> We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to