[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

Ariel Weisberg (JIRA) Tue, 09 Dec 2014 09:39:46 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239729#comment-14239729
 ]


Ariel Weisberg commented on CASSANDRA-7438:
-------------------------------------------

Lot's of cool stuff here Robert.

Unit test wise there is a lot of code that is only covered indirectly or not at 
all and the behaviors are not checked for explicitly. I don't think it makes 
sense to include code that doesn't doesn't have a unit test claiming it does 
what is says it does. The various input/output streams, buffering, and 
compression all need units tests. Uns.java needs a unit test for pretty much 
every method as well as the validation functionality. HashEntry has a bunch of 
uncovered functions. For me the lack of test coverage is the biggest barrier.

What I am reacting to is that the tests are black box and miss things. 
OffHeapMap containsEntry has no tests. removeEntry has untested code. 
removeLink still has untested code. There is untested histogram stuff, 
deserializeEntry, serializeEntry. HashEntry classes have untested functions. 
HashEntries has many predicates that are untested.

Having a unit test that fuzzes against a parallel implementation at the same 
time using a different LRU map implementation would be great for a black box 
test. You can stripe the other implementation the same way so that the eviction 
matches.

One of my previous comments was that SegmentCacheImpl duplicates reference 
counting code from OffHeapMap and should just delegate. It ends up doing that 
anyways.

I would really like to see the cleanup/eviction code go away. If inserting an 
entry would blow capacity remove entries until it doesn't. I don't see a reason 
to monkey with thresholds. 

At some point the existing C* cache interface needs to gel with your work. 
Right now C* uses the hotN and getKeys interface to return the contents of the 
cache for persistence. I think the path of least resistance to start would be 
to implement the existing interface and then come back and look at how to get 
compression and more efficient IO into all the implementations. The existing 
stuff in C* doesn't do compression and doesn't buffer its IO. I would prefer to 
minimize major changes to the existing C* code. I want to get it working and 
then iterate further for other improvements like more efficient cache 
serialization.

You could change the OHC interface or implement an adapter. I think it's fine 
to modify ICache to return iterables or iterators instead of a collections to 
incrementally produce key set and hot keys. For everything else I would really 
like to see things to stay the same unless there is something to be gained by 
changing the interface.

> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
>                 Key: CASSANDRA-7438
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Vijay
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-7438.patch, tests.zip
>
>
> Currently SerializingCache is partially off heap, keys are still stored in 
> JVM heap as BB, 
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better 
> results, but this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off 
> heap and use JNI to interact with cache. We might want to ensure that the new 
> implementation match the existing API's (ICache), and the implementation 
> needs to have safe memory access, low overhead in memory and less memcpy's 
> (As much as possible).
> We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

Reply via email to