[ 
https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861886#comment-13861886
 ] 

Benedict commented on CASSANDRA-5549:
-------------------------------------

I have a patch available for this 
[here|https://github.com/belliottsmith/cassandra/tree/only-5549]

I've been a little reticent to post it, as it's a bit of a monster of a patch, 
but I think I've now done my best to keep it well commented and mostly limit 
unnecessary changes. There are some changes that may appear over engineered for 
their current use, but I am using these in a continuation of this patch for 
off-heap memtables. I'll describe some of these below, but unpicking still 
useful changes seemed wasteful. If they get in the way of review we can revisit 
that decision.

There are several main areas of updates:

1) Removal of switchLock itself: The main work here is actually in the 
OpOrdering synchronisation class. This class explains itself, so I won't go 
into detail here, but provides an easy mechanism for ensuring we can coordinate 
our updates to Memtables so that we know what CL position they contain data to, 
and to know when the memtable is safe to be written to disk. The actual 
flushing of the memtable has been refactored a little also, to keep ordering 
guarantees.

2) Allocators and Memory Management: by removing the switch lock, we get rid of 
our ability to control heap growth by row mutations. To fix this, I've created 
the concept of a PoolAllocator, with associated Pool that has fixed memory 
limits. Any allocation requires the pool to allot room from its limit to the 
allocator (this is dealt with by MemoryTracker and MemoryOwner). This required 
a lot of minor modifications all over the place, to make measurement of object 
sizes at modification time cheap and accurate. Mostly I've achieved this by 
modifying jamm - a new branch is 
[here|https://github.com/belliottsmith/jamm/tree/guess] so that it will always 
give us a useful answer. Wherever we used to be using ObjectSizes adhoc in a 
class (generally incorrectly it turns out, not unsurprisingly as the API isn't 
obvious) I now *always* call measure() on an instance of the object and store 
that in a static field, and use simpler methods for any dynamic space use.

Worth noting: I've renamed IMeasureableMemory.memorySize() to excessHeapSize(), 
and I've modified (where applicable) its value to only count data we wouldn't 
otherwise be storing. This only makes a difference in a few places, but I think 
is an important distinction.

This change also makes any limit on flush queue size irrelevant, so the metric 
we use for controlling flushing is instead a ratio of in-use-memory to 
memory-limit, ignoring any already flushing data, which once breached will 
trigger a flush of the largest CFS.

3) Some concurrency primitives: NonBlockingQueue (and related classes) and 
WaitQueue. NonBlockingQueue is used more extensively in the off heap changes, 
but I leave it in here because it improves WaitQueue a lot, and we rely on 
WaitQueue much more with the proliferation of the OpOrdering operations. It 
helps us move much closer to completely non-blocking read/write operations 
also. We also use it to get rid of the Thread.yield() in SlabAllocator. I've 
aimed to keep NBQ as simple as possible.

4) CommitLog has been updated to use OpOrdering, and also includes a bug fix. I 
considered splitting this into a separate ticket, but it's such a tiny 
proportion of the overall changes I'm not sure it warrants it. The bug fix we 
may want to split out if this takes a while to go through.


> Remove Table.switchLock
> -----------------------
>
>                 Key: CASSANDRA-5549
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5549
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 2.1
>
>         Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png
>
>
> As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write 
> path.  ReentrantReadWriteLock is not lightweight, even if there is no 
> contention per se between readers and writers of the lock (in Cassandra, 
> memtable updates and switches).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to