[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070533#comment-14070533
 ] 

graham sanderson commented on CASSANDRA-7546:
---------------------------------------------

Well that makes sense, I hadn't checked if there was a limit on mutator threads 
- we didn't change it... this probably explains the hard upper bound in my 
synthetic test (which incidentally does not to the transformation)

I agree with you on SnapTreeMap, once I see the "essentially" free clone 
operation has to acquire a lock (or at least wait for no mutations)... I 
surmised there were probably dragons there that might cause all kinds of 
nastyness whether it be pain on concurrent updates to a horribly unbalanced 
tree, or dragging huge amounts of garbage with it due to overly lazy copy on 
write (again I didn't look too closely). BTree looks much better (and probably 
does less rebalancing since it has wider nodes I think), though as discussed it 
doesn't prevent the underlying race.

So, I'll see if I have time to work on this later today, but the plan is... for 
2.0.x (just checking)

a) move the transformation.apply out of the loop and do it once
b) do a one way flip flag per AtomicSortedColumns instance, which is flipped 
when a cost reaches a certain value. I was going to calculate the delta in each 
mutator thread (probably adding a log-like measure e.g. using 
Integer.numberOfLeadingZeros(tree.size()) per failing CAS), though looking ugh 
at SnapTreeMap again, it seems that tree.size() is not a good method to call in 
the presence of mutations, so I guess Holder can just track the tree size itself
c) given this is possibly a temporary solution, is it worth exposing the 
"cut-off" value even un-documented such that it could be overridden in 
cassandra.yaml? Note the default should be such that most AtomicSortedColumns 
instance never get cut-off since they are not heavily contended and large 
(indicating contended inserts not updates)


> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7546
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: graham sanderson
>            Assignee: graham sanderson
>         Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to