[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069173#comment-14069173
 ] 

graham sanderson commented on CASSANDRA-7546:
---------------------------------------------

Excellent - I will take a look in the 2.1 branch - I was wondering if there 
were some sample profiles.

The main problem we have in 2.0.x is that if we are under relatively heavy 
sustained write load, so we are allocating memtable slabs along with all the 
small short lived objects in the commit log and write path... you add to that 
hinting which means more memtable slabs, and now because of single partition 
for hints, much larger snap trees (whose somewhat contentious 
lazy-copy-on-write may or may not make things worse, I don't know)... under 
that allocation rate we spill huge numbers of small (possibly snap tree nodes) 
objects into the tenured gen along with the slabs, which tends to lead to 
promotion failure and need for compaction.

I'll have to play around, but I don't think it is easy to capture the effect of 
excessive (intended to be) temporary object allocation in a stress test as 
opposed to excessive CPU because the GC can cope really well until it doesn't.

Note my belief is your new tree in 2.1 probably mitigates the problem quite a 
bit (no contention in the tree, wider nodes, less rebalancing etc), though I 
suggest we still fix the CAS loop allocation there too.


> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7546
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: graham sanderson
>            Assignee: graham sanderson
>         Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to