[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069896#comment-14069896
 ] 

Benedict commented on CASSANDRA-7546:
-------------------------------------

bq. Alternatively if you are saying, let each thread keep working while they 
still believe they can win, 

This was my original rationale for the patch I posted, however now I am much 
more in favour of 

bq. a one way switch per Atomic*Columns instance that flips after a number 
waster "operations"?

However whether it is one-way or not is somewhat unimportant for me. This flip 
would only last the lifetime of a memtable, which is not super lengthy (under 
heavily load probably only a few minutes), and would not have dramatically 
negative consequences if it got it slightly wrong

However^2 I'm still having a hard time believing rebalancing costs in snap tree 
can be that high, and further if that really is the problem it should not be an 
issue in 2.1, as the b-tree rebalances with O(lg(N)) allocations. I'd be a 
little surprised if the snap tree didn't do the same, as if there were more 
than O(lg(N)) allocations, the algorithmic complexity would be > O(lg(N)) also. 
It's possible somehow that it manages to inter-refererence with on-going 
copies, so that we get a highly complex graph that retains exponentially more 
garbage the more competing updates there are, but again I would be very 
surprised if this were the case. However outside of either of these I would 
expect the garbage generated to all be immediately collectible, so it would 
have to be the sheer volume alone that overwhelmed the GC, which is certainly 
possible but this would entail a _lot_ of hinting, and I'd be surprised if a 
node could be receiving a large enough quantity. On the other hand the arena 
allocations in 2.0 are definitely incapable of being collected and could be 
allocated almost as rapidly.

bq. I'm not sure which changes you are talking about back-porting and whether 
the "at most twice" refers to looping once then locking

In this instance I'm referring to copying the source ColumnFamily locally in 
the variable once after failing the cas, so that we do not keep allocating 
arena space. Alternatively, we could just do it upfront in the method, as the 
only extra cost is an array allocation proportional in size to the input data, 
which is fairly cheap.

All of this said, I think the behaviour of locking after wasting an excessive 
number of cycles is still a good one, so I'm comfortable introducing it either 
way, and it would certainly help with all of the above causes.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7546
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: graham sanderson
>            Assignee: graham sanderson
>         Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to