[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153780#comment-14153780
 ] 

graham sanderson commented on CASSANDRA-7546:
---------------------------------------------

Just a little update:

I have numbers for one node down & hinting with heap_buffers, I just need to 
re-run a few tests since there were a couple of spurious points (might have be 
due to not using a totally clean cluster every time - this is not a cluster I 
can easily re-create) that I want to verify before I post them.

Generally this patch thus far seems to be good, and while there is a non-"sweet 
spot" where it can be mildly harmful, this is basically on the knife edge of 
where you are almost overcommitting your hardware, which is probably not where 
people are hoping to be running.

The other point to note is that while the excess GC allocation here does not 
cause huge issues, in a busy cluster which had a huge number of resident slabs 
to start off with, this can cause major knock on GC - head-aches (with slabs 
spilling into old gen with other garbage etc)... The GC issue isn't as much of 
a problem with the native allocators in 2.1 (though they do seem to become a 
bottleneck under high allocation rates), the fact that it is still generally 
faster with this patch suggests we should keep it on for those too.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7546
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: graham sanderson
>            Assignee: graham sanderson
>             Fix For: 2.1.1
>
>         Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, graph2_7546.png, 
> graphs1.png, hint_spikes.png, suggestion1.txt, suggestion1_21.txt, 
> young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to