Alexey Serbin created KUDU-3345:
-----------------------------------
Summary: Enforce hard memory limit upon accepting a request into
the server RPC queue
Key: KUDU-3345
URL: https://issues.apache.org/jira/browse/KUDU-3345
Project: Kudu
Issue Type: Improvement
Components: master, tserver
Affects Versions: 1.15.0
Reporter: Alexey Serbin
As of 1.15.0 version, {{kudu-tserver}} and {{kudu-master}} both don't take into
account current memory usage when admitting requests into the RPC queue. The
only limit that is checked by {{ServicePool::QueueInboundCall()}} is the
current size of the RPC service queue size, which is controlled by the
{{\-\-rpc_service_queue_length}} flag.
Given that the size of an incoming request might go as high as
{{\-\-rpc_max_message_size}} (50MiB by default) and
{{-\-\rpc_service_queue_length}} might be set high to accommodate for a surge
of incoming requests, Kudu servers might go beyond the hard memory limit
controlled by the {{\-\-memory_limit_hard_bytes}} flag. Also, the Raft prepare
queue doesn't seem to expose a limit on the total size of requests accumulated
in the queue. If too much memory is consumed by a Kudu server, it might exit
unexpectedly either because it is killed by OOM killer or the {{new}} operator
throws {{std::bad_alloc}} and the C++ runtime terminates the process with
{{SIGABRT}} since memory allocation failures are not handled in the Kudu code.
At least, we saw an evidence of such situation when disk IO was very slow and
{{kudu-tserver}} has accumulated many requests in its prepare queue (probably,
there was some particular workload pattern which first sent many small write
requests first and then followed up with big ones).
--
This message was sent by Atlassian Jira
(v8.20.1#820001)