[jira] [Commented] (KUDU-1395) Scanner KeepAlive requests can get starved on an overloaded server

Jean-Daniel Cryans (JIRA) Tue, 05 Apr 2016 15:51:05 -0700

    [ 
https://issues.apache.org/jira/browse/KUDU-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227297#comment-15227297
 ]


Jean-Daniel Cryans commented on KUDU-1395:
------------------------------------------

bq. Con: typically, a client would like a KeepAlive call to be light 
weight/fast.

IMO it's still light weight with that solution, and potentially fast if the 
server does the right thing.

bq. 2) we could add a new RPC system feature such that certain RPCs are allowed 
in a "fast lane"

So now we have different queues for different services, this would add another 
level of complexity. Meh.

bq. 3) some fancier scheduler which tries to estimate and take into account RPC 
costs, and not just deadlines

That sounds fun in the long term.

> Scanner KeepAlive requests can get starved on an overloaded server
> ------------------------------------------------------------------
>
>                 Key: KUDU-1395
>                 URL: https://issues.apache.org/jira/browse/KUDU-1395
>             Project: Kudu
>          Issue Type: Bug
>          Components: impala, rpc, tserver
>    Affects Versions: 0.8.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> As of 0.8.0, the RPC system schedules RPCs on an earliest-deadline-first 
> basis, rejecting those with later deadlines. This works well for RPCs which 
> are retried on SERVER_TOO_BUSY errors, since the retries maintain the 
> original deadline and thus get higher and higher priority as they get closer 
> to timing out.
> We don't, however, do any retries on scanner KeepAlive RPCs. So, if a 
> keepalive RPC arrives at a heavily overloaded tserver, it will likely get 
> rejected, and won't retry. This means that Impala queries or other long scans 
> that rely on KeepAlives will likely fail on overloaded clusters since the 
> KeepAlive never gets through.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KUDU-1395) Scanner KeepAlive requests can get starved on an overloaded server

Reply via email to