[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070133#comment-14070133
 ] 

Benedict commented on CASSANDRA-7546:
-------------------------------------

bq. let me know if you want me to take another stab at the patch

We're always keen for competent new comers to start contributing to the 
project; if you've got the time that would be great, and I can review. If not, 
I'm happy to make this change.

bq. we probably have hundreds of concurrent mutator threads for them

This should never be the case. By default there are 32 concurrent writers 
permitted, and this should never be changed to more than a small multiple of 
the number of physical cores on the machine (unless running batch CL), so if 
there are hundreds something is going wrong. Furthermore, it makes very little 
sense that this problem wouldn't be hit by as many concurrent large 
modifications: the race condition is the same, but much easier to hit the more 
work there is being done per concurrent modifier. 

I decided to take a peek at the SnapTreeMap code, since this didn't make much 
sense, and I see that there is a very different behaviour if we have many 
clones() as opposed to many updates (larger updates would necessarily result in 
a lower incidence/overlap of clone()), as epochs attempt to be allocated. I 
don't really have time to waste digging any deeper, but it seems possible that 
this code path results in a great deal more object allocation (and possibly 
allocations that are not easily collectible) than simply performing many large 
updates. If this is the case, then again 2.1 will not suffer this problem. This 
doesn't feel like a satisfactory explanation, and nor does the slightly 
different possible synchronization behaviour with larger updates (snap tree is 
littered with synchronized() calls, which might possibly overlap more often 
with many updates).

Either way, I'm happy to introduce the mitigation strategy we've discussed, 
since it makes sense in and of itself. However we clearly do not fully 
understand what is happening in your specific scenario, but I do not want to 
dig further into snap tree - it's a really ugly contraption!

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7546
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: graham sanderson
>            Assignee: graham sanderson
>         Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to