[
https://issues.apache.org/jira/browse/KUDU-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17187879#comment-17187879
]
ASF subversion and git services commented on KUDU-1587:
-------------------------------------------------------
Commit ee3bb83575a051c2feade1f8c159b2902a7160d5 in kudu's branch
refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=ee3bb83 ]
KUDU-1587 part 2: reject write ops if apply queue is overloaded
This patch implements control admission for write requests in tablet
servers based on the load status of their apply queue. With this change,
the recently introduced OpApplyQueueTest.ApplyQueueBackpressure scenario
successfully passes.
If the queue times of the tasks in the apply queue become higher than
the specified threshold, the apply queue enters overloaded state. When
the queue is overloaded, the tablet server rejects incoming write
requests with some probability. The longer the queue stays overloaded,
the greater the probability of rejections. The apply queue exits the
overloaded state when queue times drop below the specified threshold.
This new behavior is not yet enabled by default, keeping the legacy
behavior of unbounded/uncontrolled queue times as is. To enable it,
set --tablet_apply_pool_overload_threshold_ms to something greater
than 0 (e.g., 500).
Change-Id: I6d7688d6fa832e606b8efc4549568fa52dfa1931
Reviewed-on: http://gerrit.cloudera.org:8080/16343
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <[email protected]>
> Memory-based backpressure is insufficient on seek-bound workloads
> -----------------------------------------------------------------
>
> Key: KUDU-1587
> URL: https://issues.apache.org/jira/browse/KUDU-1587
> Project: Kudu
> Issue Type: Bug
> Components: tserver
> Affects Versions: 0.10.0, 1.0.0, 1.0.1, 1.1.0, 1.2.0, 1.3.0, 1.3.1, 1.4.0,
> 1.5.0, 1.6.0, 1.7.0, 1.8.0, 1.7.1, 1.9.0, 1.10.0, 1.10.1, 1.11.0, 1.12.0,
> 1.11.1
> Reporter: Todd Lipcon
> Assignee: Alexey Serbin
> Priority: Critical
> Labels: roadmap-candidate
> Attachments: graph.png, queue-time.png
>
>
> I pushed a uniform random insert workload from a bunch of clients to the
> point that the vast majority of bloom filters no longer fit in buffer cache,
> and the compaction had fallen way behind. Thus, every inserted row turns into
> 40+ seeks (due to non-compact data) and takes 400-500ms. In this kind of
> workload, the current backpressure (based on memory usage) is insufficient to
> prevent ridiculously long queues.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)