Todd Lipcon created KUDU-1395:
---------------------------------
Summary: Scanner KeepAlive requests can get starved on an
overloaded server
Key: KUDU-1395
URL: https://issues.apache.org/jira/browse/KUDU-1395
Project: Kudu
Issue Type: Bug
Components: impala, rpc, tserver
Affects Versions: 0.8.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
As of 0.8.0, the RPC system schedules RPCs on an earliest-deadline-first basis,
rejecting those with later deadlines. This works well for RPCs which are
retried on SERVER_TOO_BUSY errors, since the retries maintain the original
deadline and thus get higher and higher priority as they get closer to timing
out.
We don't, however, do any retries on scanner KeepAlive RPCs. So, if a keepalive
RPC arrives at a heavily overloaded tserver, it will likely get rejected, and
won't retry. This means that Impala queries or other long scans that rely on
KeepAlives will likely fail on overloaded clusters since the KeepAlive never
gets through.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)