[
https://issues.apache.org/jira/browse/CASSANDRA-15367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027080#comment-17027080
]
Benedict Elliott Smith commented on CASSANDRA-15367:
----------------------------------------------------
bq. but I’m not sure if it’s worth addressing
I don't think any deadlock is acceptable to ignore. Hmm. If we don't go with
one of the other approaches I've suggested, I'll have to find some time in a
week to see if there's a variant of this suggested approach that works in this
respect.
bq. <random-idea>
I think this is something I have proposed before, but it's not trivial. I had
planned to implement something like this as part of my work addressing this
problem, but decided not to given the complexity. The idea would be to
introduce a linked-list of deferred updates, and merge them either on future
reads or writes, but ensuring everyone sees a consistent view with this
approach, while minimising duplicated work and ensuring progress, is less
trivial than I imagined when I proposed it a while ago.
bq. About removing the lock, I’m sure 15511 will help with contention, and we
should commit it, however I think there will still be pathological cases where
faster updates won’t be enough
We can benchmark this specific scenario, but all we really care about is if the
aggregate behaviour for all 21 operations is good enough to warrant removal of
the lock, and the commensurate reduction in complexity when reasoning about the
system (that has been _amply_ demonstrated by this ticket). IMO, the
performance numbers from 15511 more than cross this threshold, but we can
certainly explore further verification work to be certain.
> Memtable memory allocations may deadlock
> ----------------------------------------
>
> Key: CASSANDRA-15367
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15367
> Project: Cassandra
> Issue Type: Bug
> Components: Local/Commit Log, Local/Memtable
> Reporter: Benedict Elliott Smith
> Assignee: Benedict Elliott Smith
> Priority: Normal
> Fix For: 4.0, 2.2.x, 3.0.x, 3.11.x
>
>
> * Under heavy contention, we guard modifications to a partition with a mutex,
> for the lifetime of the memtable.
> * Memtables block for the completion of all {{OpOrder.Group}} started before
> their flush began
> * Memtables permit operations from this cohort to fall-through to the
> following Memtable, in order to guarantee a precise commitLogUpperBound
> * Memtable memory limits may be lifted for operations in the first cohort,
> since they block flush (and hence block future memory allocation)
> With very unfortunate scheduling
> * A contended partition may rapidly escalate to a mutex
> * The system may reach memory limits that prevent allocations for the new
> Memtable’s cohort (C2)
> * An operation from C2 may hold the mutex when this occurs
> * Operations from a prior Memtable’s cohort (C1), for a contended partition,
> may fall-through to the next Memtable
> * The operations from C1 may execute after the above is encountered by those
> from C2
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]