[
https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836818#comment-13836818
]
Benedict commented on CASSANDRA-5549:
-------------------------------------
Without switch lock, we won't have anything preventing writes coming through
when we're over-burdened with memory use by memtables.
What I'd like to suggest is effectively a global Semaphore, with permits equal
to the size allocated for memtables; on KS.apply(RM) we estimate the size of
the RM and take that many permits. Once we've added the RM and know better how
much it occupies, we adjust the Semaphore to (more) accurately reflect the
amount of memory in use. When we flush a memtable we release permits equal to
the *estimated size* of each RM.
This may be pushing the boat out, but would probably result in not relying on
memtable live metering/scanning for size estimation, which we could retire.
Either way we're estimating the size, but with this approach we're keeping
*tight* control over the (estimated) memory allocated to memtables, whereas at
the moment we have some tricks that we hope keep it there. If we estimate space
used cautiously, we should be able to better guarantee no OOM, at least from
this part of the code.
I have a *reasonably* straight forward scheme for estimating size used by a RM
that should be as good as we currently have. Basic premise is to calculate
average space used by an item in ConcurrentSkipListMap using metering at
startup with a map of size, say, 1M entries, rounded up. If we depend on
CASSANDRA-6271 we can easily calculate exact overhead for the BTrees, or
otherwise can do a similar metering approach for SnapTreeMap. So we have an
overhead per row and per value. Separately we track how much space we are using
for a given memtable's slab allocator. We use the RM's data size only for the
initial estimation, to decide if we have room, and ignore it once it's actually
added, as it will be accounted for in the slaballocator.
> Remove Table.switchLock
> -----------------------
>
> Key: CASSANDRA-5549
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5549
> Project: Cassandra
> Issue Type: Bug
> Reporter: Jonathan Ellis
> Assignee: Vijay
> Labels: performance
> Fix For: 2.1
>
> Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png
>
>
> As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write
> path. ReentrantReadWriteLock is not lightweight, even if there is no
> contention per se between readers and writers of the lock (in Cassandra,
> memtable updates and switches).
--
This message was sent by Atlassian JIRA
(v6.1#6144)