[jira] [Commented] (CASSANDRA-15367) Memtable memory allocations may deadlock

Blake Eggleston (Jira) Fri, 24 Jan 2020 16:42:14 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023358#comment-17023358
 ]


Blake Eggleston commented on CASSANDRA-15367:
---------------------------------------------

Yep, I think that would fix the problem. Another approach that wouldn’t have 
the potential to introduce delays would be to skip locking if we have (or are 
about to) set the final replay position on a memtable waiting on an op group. 
Like setting blocking, but it won’t bypass the allocator in case the flush 
queue is long. That would fix the deadlock without delaying later writes, 
although it could increase contention.

Rough example with lazy naming 
[here|https://github.com/bdeggleston/cassandra/tree/15367-alternative-2]

It would be nice if a write waiting for a lock could unblock itself as soon as 
it's op group becomes blocking

Random thoughts about longer term fixes:

I didn’t have a chance to get my head around how you’d intended to remove the 
lock completely, but I don’t understand how that could be done without 
reintroducing the contention gc problem.

It seems to me that the root cause of all this is that we have 2 mechanisms for 
ordering events (OpOrder and ReplayPosition) which are mostly independent, but 
have to interact in non-deterministic ways during memtable flush, which creates 
these edge cases. I think the right fix (or one of them) is to either merge 
these two classes, or make one control the other.

> Memtable memory allocations may deadlock
> ----------------------------------------
>
>                 Key: CASSANDRA-15367
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15367
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Commit Log, Local/Memtable
>            Reporter: Benedict Elliott Smith
>            Assignee: Benedict Elliott Smith
>            Priority: Normal
>             Fix For: 4.0, 2.2.x, 3.0.x, 3.11.x
>
>
> * Under heavy contention, we guard modifications to a partition with a mutex, 
> for the lifetime of the memtable.
> * Memtables block for the completion of all {{OpOrder.Group}} started before 
> their flush began
> * Memtables permit operations from this cohort to fall-through to the 
> following Memtable, in order to guarantee a precise commitLogUpperBound
> * Memtable memory limits may be lifted for operations in the first cohort, 
> since they block flush (and hence block future memory allocation)
> With very unfortunate scheduling
> * A contended partition may rapidly escalate to a mutex
> * The system may reach memory limits that prevent allocations for the new 
> Memtable’s cohort (C2) 
> * An operation from C2 may hold the mutex when this occurs
> * Operations from a prior Memtable’s cohort (C1), for a contended partition, 
> may fall-through to the next Memtable
> * The operations from C1 may execute after the above is encountered by those 
> from C2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-15367) Memtable memory allocations may deadlock

Reply via email to