[ 
https://issues.apache.org/jira/browse/CASSANDRA-15367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017436#comment-17017436
 ] 

Benedict Elliott Smith commented on CASSANDRA-15367:
----------------------------------------------------

For comparison, [this 
patch|https://github.com/belliottsmith/cassandra/tree/15367-a2] addresses this 
ticket by ensuring allocations only happen whilst the lock is not held.  It 
aims to reduce the necessity of locking, not just for this use case, without 
removing it altogether. 

So that the fast path is unaffected, we perform our first attempt to insert as 
normal
Unlike before, we disable {{abortEarly}} for this first attempt, so that we 
always construct a complete new tree
If we fail, we walk this new tree, looking for any remnants of the insert
These remnants are collected into a new insert containing only the parts that 
were retained after resolving
This new insert contains only Memtable-allocated data, so we do not need to 
copy anything next attempt
Future attempts to insert operate on this minimal copied version of the data, 
this preventing the worst case scenario the lock was introduced for, namely 
Memtable exhaustion
However, to minimise any performance regression, we retain the lock and 
continue to perform the same waste tracking as before
If locking has been enabled for the partition, step 1 is skipped, and we 
immediately copy the entire insert into the Memtable before obtaining the lock

The performance impact of this patch is still being comprehensively validated, 
and the results will be posted in a few days. It is reasonable to expect that 
there will be some slight performance penalty in some cases, and some 
improvements in others.


> Memtable memory allocations may deadlock
> ----------------------------------------
>
>                 Key: CASSANDRA-15367
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15367
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Commit Log, Local/Memtable
>            Reporter: Benedict Elliott Smith
>            Assignee: Benedict Elliott Smith
>            Priority: Normal
>             Fix For: 4.0, 2.2.x, 3.0.x, 3.11.x
>
>
> * Under heavy contention, we guard modifications to a partition with a mutex, 
> for the lifetime of the memtable.
> * Memtables block for the completion of all {{OpOrder.Group}} started before 
> their flush began
> * Memtables permit operations from this cohort to fall-through to the 
> following Memtable, in order to guarantee a precise commitLogUpperBound
> * Memtable memory limits may be lifted for operations in the first cohort, 
> since they block flush (and hence block future memory allocation)
> With very unfortunate scheduling
> * A contended partition may rapidly escalate to a mutex
> * The system may reach memory limits that prevent allocations for the new 
> Memtable’s cohort (C2) 
> * An operation from C2 may hold the mutex when this occurs
> * Operations from a prior Memtable’s cohort (C1), for a contended partition, 
> may fall-through to the next Memtable
> * The operations from C1 may execute after the above is encountered by those 
> from C2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to