[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061511#comment-14061511
 ] 

graham sanderson commented on CASSANDRA-7546:
---------------------------------------------

The idea of the change here (which has three copies of the original loop which 
you could argue is uglier and more complicated, but in some ways is also 
clearer), is to try to minimize the number of loop attempts which allocate 
memory then fail the CAS

The first "raw" loop is largely identical to the old code
 - there is a check of a new non volatile Holder variable, but does no new 
memory barrier requiring concurrency related checking
 - the CAS is not attempted (as part of the loop condition) if the loop 
continues due to early-out
 - I moved the deletioninfo stuff after the columns
 - I deferred the creation of the modified Holder until the last minute (mostly 
because in the other loops the choice of the last constructor parameter is best 
made last, and I wanted to keep all versions of the loop with the same 
structure)

The Holder instance, now tracks a state machine via the new contentionCount 
AtomicInteger variable. The first loop is used when the  contentionCount is 
null. Any first loop thread, which fails the first CAS, and then succeeds on a 
future first loop iteration will install an AtomicInteger(0) value, directing 
future traffic to the second "count" loop

The "count" loop incudes a try/finally and tracks the number of concurrent 
calls in the contentionCount variable... if the contention count is >1 traffic 
is directed to the third "sync" loop. This loop will also on successful CAS 
deflate contentionCount to null, if it was at 1 (i.e. there wasn't any current 
contention) just before the CAS

The "sync" loop is protected by a monitor (note choice here over j.u.c.Lock 
because of Lock owner tracking overhead). This loop can still lose CAS, if 
another AtomicSortedColumn function is doing the CAS, or another thread is 
CASing in either the "raw" or the "count" loop still.

The overall design is intended to be effectively free in the uncontended case, 
and to adapt to effectively be completed synchronized in the highly contended 
case (for which the original code was potentially really bad), and to be as 
good or better than the original code in between.

To that end, the debug code here today, tracks the ratio of loop iterations 
attempted (i.e. they do some work before either failing CAS or doing an early 
out because they know the CAS would fail)... Ideally this would be 1, and 
should remain close to that in all cases

> AtomicSortedColumns.addAllWithSizeDelta has a spin lock that allocates memory
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7546
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: graham sanderson
>         Attachments: suggestion1.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to