[
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061511#comment-14061511
]
graham sanderson commented on CASSANDRA-7546:
---------------------------------------------
The idea of the change here (which has three copies of the original loop which
you could argue is uglier and more complicated, but in some ways is also
clearer), is to try to minimize the number of loop attempts which allocate
memory then fail the CAS
The first "raw" loop is largely identical to the old code
- there is a check of a new non volatile Holder variable, but does no new
memory barrier requiring concurrency related checking
- the CAS is not attempted (as part of the loop condition) if the loop
continues due to early-out
- I moved the deletioninfo stuff after the columns
- I deferred the creation of the modified Holder until the last minute (mostly
because in the other loops the choice of the last constructor parameter is best
made last, and I wanted to keep all versions of the loop with the same
structure)
The Holder instance, now tracks a state machine via the new contentionCount
AtomicInteger variable. The first loop is used when the contentionCount is
null. Any first loop thread, which fails the first CAS, and then succeeds on a
future first loop iteration will install an AtomicInteger(0) value, directing
future traffic to the second "count" loop
The "count" loop incudes a try/finally and tracks the number of concurrent
calls in the contentionCount variable... if the contention count is >1 traffic
is directed to the third "sync" loop. This loop will also on successful CAS
deflate contentionCount to null, if it was at 1 (i.e. there wasn't any current
contention) just before the CAS
The "sync" loop is protected by a monitor (note choice here over j.u.c.Lock
because of Lock owner tracking overhead). This loop can still lose CAS, if
another AtomicSortedColumn function is doing the CAS, or another thread is
CASing in either the "raw" or the "count" loop still.
The overall design is intended to be effectively free in the uncontended case,
and to adapt to effectively be completed synchronized in the highly contended
case (for which the original code was potentially really bad), and to be as
good or better than the original code in between.
To that end, the debug code here today, tracks the ratio of loop iterations
attempted (i.e. they do some work before either failing CAS or doing an early
out because they know the CAS would fail)... Ideally this would be 1, and
should remain close to that in all cases
> AtomicSortedColumns.addAllWithSizeDelta has a spin lock that allocates memory
> -----------------------------------------------------------------------------
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
> Issue Type: Bug
> Reporter: graham sanderson
> Attachments: suggestion1.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update,
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some
> fairly staggering memory growth (the more cores on your machine the worst it
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same
> partition, hinting today, does, and in this case wild (order(s) of magnitude
> more than expected) memory allocation rates can be seen (especially when the
> updates being hinted are small updates to different partitions which can
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation
> whilst not slowing down the very common un-contended case.
--
This message was sent by Atlassian JIRA
(v6.2#6252)