Alexey Serbin created KUDU-3345:
-----------------------------------

             Summary: Enforce hard memory limit upon accepting a request into 
the server RPC queue
                 Key: KUDU-3345
                 URL: https://issues.apache.org/jira/browse/KUDU-3345
             Project: Kudu
          Issue Type: Improvement
          Components: master, tserver
    Affects Versions: 1.15.0
            Reporter: Alexey Serbin


As of 1.15.0 version, {{kudu-tserver}} and {{kudu-master}} both don't take into 
account current memory usage when admitting requests into the RPC queue.  The 
only limit that is checked by {{ServicePool::QueueInboundCall()}} is the 
current size of the RPC service queue size, which is controlled by the 
{{\-\-rpc_service_queue_length}} flag.

Given that the size of an incoming request might go as high as 
{{\-\-rpc_max_message_size}} (50MiB by default) and 
{{-\-\rpc_service_queue_length}} might be set high to accommodate for a surge 
of incoming requests, Kudu servers might go beyond the hard memory limit 
controlled by the {{\-\-memory_limit_hard_bytes}} flag.  Also, the Raft prepare 
queue doesn't seem to expose a limit on the total size of requests accumulated 
in the queue.  If too much memory is consumed by a Kudu server, it might exit 
unexpectedly either because it is killed by OOM killer or the {{new}} operator 
throws {{std::bad_alloc}} and the C++ runtime terminates the process with 
{{SIGABRT}} since memory allocation failures are not handled in the Kudu code.

At least, we saw an evidence of such situation when disk IO was very slow and 
{{kudu-tserver}} has accumulated many requests in its prepare queue (probably, 
there was some particular workload pattern which first sent many small write 
requests first and then followed up with big ones).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to