[
https://issues.apache.org/jira/browse/KUDU-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15678278#comment-15678278
]
Todd Lipcon commented on KUDU-1587:
-----------------------------------
Been pondering this a bit today. Here's a sketch of a possible solution which
shouldn't be difficult to implement:
- use something like the "codel" algorithm on the Apply threadpool. Overview is:
-- for each task that comes off the queue, measure its queue time (already done
for the purpose of metrics)
-- if the queue time is above a "target queue time" (eg 100ms), then the queue
is in "overloaded" state. Otherwise it is in a "good" state. Overloaded implies
some kind of standing queue.
-- if overloaded, keep track of how long we have been in the overloaded state.
- when a new operation is about to start, check (before PREPARE) whether the
queue is in the overloaded state. If so, reject the write with some
probability. The probability should be based on (a) how many writes have been
dropped so far since we entered the overloaded state, and (b) how many
operations the write contains
The hope is that, if the apply queue is overloaded, we'll start shedding load
more and more aggressively rather than accumulating a longer and longer queue.
Any thoughts on another potential solution that might work well?
> Memory-based backpressure is insufficient on seek-bound workloads
> -----------------------------------------------------------------
>
> Key: KUDU-1587
> URL: https://issues.apache.org/jira/browse/KUDU-1587
> Project: Kudu
> Issue Type: Bug
> Components: tserver
> Affects Versions: 0.10.0
> Reporter: Todd Lipcon
> Priority: Critical
> Attachments: graph.png, queue-time.png
>
>
> I pushed a uniform random insert workload from a bunch of clients to the
> point that the vast majority of bloom filters no longer fit in buffer cache,
> and the compaction had fallen way behind. Thus, every inserted row turns into
> 40+ seeks (due to non-compact data) and takes 400-500ms. In this kind of
> workload, the current backpressure (based on memory usage) is insufficient to
> prevent ridiculously long queues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)