[
https://issues.apache.org/jira/browse/CASSANDRA-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189583#comment-15189583
]
Benedict commented on CASSANDRA-11327:
--------------------------------------
Perhaps you should outline precisely the algorithm you propose, since there's a
whole class of similar algorithms and it would narrow the discussion?
But the statement that you are reducing the total memory available for
memtables must by definition increase latency for those writes that would have
been fully accommodated by the full buffer capacity (and no longer can due to
artificial reduction). The only way this does not affect latency is when the
cluster is overloaded - which admittedly all of our performance tests induce,
despite this being completely not what Cassandra is designed for.
Memtables are there to smooth out the natural variance in the message arrival
distribution. A properly tuned cluster would ensure that overload occurs only
some SLA frequency, say 3 sigma chance. By reducing their size, transient
overload becomes more frequent, and SLAs are not met or the cluster capacity
must be increased. Now, a Cassandra cluster simply _cannot_ cope with
sustained overload, no matter what we do here; LSMTs seal our fate very rapidly
in that situation. So I don't personally see the rationale for making
transient overload (Cassandra's strong suit) worse, in exchange for a really
temporary reprieve on sustained overload.
bq. I wasn't aware the partially off heap and off heap memtables were able to
reclaim memory incrementally during flushing.
They aren't, but the patch I linked introduced this against a pre-2.1 branch.
It wasn't exactly trivial to do, though (it introduced a constrained pauseless
compacting GC), and it is probably better to wait until TPC to think about
reattempting this.
> Maintain a histogram of times when writes are blocked due to no available
> memory
> --------------------------------------------------------------------------------
>
> Key: CASSANDRA-11327
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11327
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Ariel Weisberg
>
> I have a theory that part of the reason C* is so sensitive to timeouts during
> saturating write load is that throughput is basically a sawtooth with valleys
> at zero. This is something I have observed and it gets worse as you add 2i to
> a table or do anything that decreases the throughput of flushing.
> I think the fix for this is to incrementally release memory pinned by
> memtables and 2i during flushing instead of releasing it all at once. I know
> that's not really possible, but we can fake it with memory accounting that
> tracks how close to completion flushing is and releases permits for
> additional memory. This will lead to a bit of a sawtooth in real memory
> usage, but we can account for that so the peak footprint is the same.
> I think the end result of this change will be a sawtooth, but the valley of
> the sawtooth will not be zero it will be the rate at which flushing
> progresses. Optimizing the rate at which flushing progresses and it's
> fairness with other work can then be tackled separately.
> Before we do this I think we should demonstrate that pinned memory due to
> flushing is actually the issue by getting better visibility into the
> distribution of instances of not having any memory by maintaining a histogram
> of spans of time where no memory is available and a thread is blocked.
> [MemtableAllocatr$SubPool.allocate(long)|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/utils/memory/MemtableAllocator.java#L186]
> should be a relatively straightforward entry point for this. The first
> thread to block can mark the start of memory starvation and the last thread
> out can mark the end. Have a periodic task that tracks the amount of time
> spent blocked per interval of time and if it is greater than some threshold
> log with more details, possibly at debug.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)