[ 
https://issues.apache.org/jira/browse/CASSANDRA-15367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039466#comment-17039466
 ] 

Blake Eggleston commented on CASSANDRA-15367:
---------------------------------------------

{quote}how you think the deadlock occurs
{quote}
It may be the same one you're referring to. Basically if a write blocks trying 
to acquire the lock on the new memtable after the final commit log position is 
set and before the write barrier is issued, there's a risk of deadlock.

Given op groups O1 & O2, replay positions R1 & R2, and memtables M1 & M2. M1 is 
flushing with a barrier on O1, and final commit log upper bound R1 is set on 
it. A new write (W1) is assigned to O1, and writes to the commit log at 
position R2. It will overflow to M2, the new memtable. Before the barrier is 
set on it, it blocks on a locked partition. Write W2 against O2 then acquires 
the lock ahead of W1, blocks on the allocator, and we deadlock.
{quote}So, I propose a variant of my earlier approach that definitely worked
{quote}
Great, as far as I can tell, this fixes the deadlock 100% and with minimal risk 
of adding new issues or disrupting system behavior. +1

> Memtable memory allocations may deadlock
> ----------------------------------------
>
>                 Key: CASSANDRA-15367
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15367
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Commit Log, Local/Memtable
>            Reporter: Benedict Elliott Smith
>            Assignee: Benedict Elliott Smith
>            Priority: Normal
>             Fix For: 4.0, 2.2.x, 3.0.x, 3.11.x
>
>
> * Under heavy contention, we guard modifications to a partition with a mutex, 
> for the lifetime of the memtable.
> * Memtables block for the completion of all {{OpOrder.Group}} started before 
> their flush began
> * Memtables permit operations from this cohort to fall-through to the 
> following Memtable, in order to guarantee a precise commitLogUpperBound
> * Memtable memory limits may be lifted for operations in the first cohort, 
> since they block flush (and hence block future memory allocation)
> With very unfortunate scheduling
> * A contended partition may rapidly escalate to a mutex
> * The system may reach memory limits that prevent allocations for the new 
> Memtable’s cohort (C2) 
> * An operation from C2 may hold the mutex when this occurs
> * Operations from a prior Memtable’s cohort (C1), for a contended partition, 
> may fall-through to the next Memtable
> * The operations from C1 may execute after the above is encountered by those 
> from C2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to