[
https://issues.apache.org/jira/browse/KUDU-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15584359#comment-15584359
]
Todd Lipcon commented on KUDU-1707:
-----------------------------------
My thinking here is to do something like the following:
- For each call 'Foo' in an RPC interface, generate a new ServiceIf method like:
{code}
virtual bool FooAsync(request, response, context) {
return false;
}
{code}
Upon receipt of an RPC, this method will be called. If it returns false, it
continues to go through the normal path. If it returns true, then the RPC
system assumes it has been fully handled, and doesn't queue it. The default
implementation would just return false as shown above. The overhead of the
extra virtual function call should be negligible.
The method would run on the reactor thread, and thus be subject to the normal
reactor restrictions (no sleeps, no locks that might wait, no log messages
under normal circumstances, etc). For the cases mentioned above, we could
implement them as:
*Ping*: just respond to the RPC with a success message (so Pings always succeed
even if the queue is full)
*Keepalive*: update the scanner and respond. (need to be careful that there
aren't circumstances where the scanner lookup could block)
*Consensus*: snooze the failure detector for the tablet and then continue
processing. A trylock might be necessary here since the consensus impl is
fairly coarse-grain locked. Another alternative would be to check if the
message is a status-only (heartbeat) message, and in that case enqueue it onto
a special processing queue for these "fast" requests.
An alternative implementation might be to add a hook which is only called on
rejection. But, I think it would be subject to the same restrictions (running
on the reactor thread, etc) and be less useful in terms of giving quick
responses to Pings, etc.
> Add hook to handle RPCs prior to queueing or rejection
> ------------------------------------------------------
>
> Key: KUDU-1707
> URL: https://issues.apache.org/jira/browse/KUDU-1707
> Project: Kudu
> Issue Type: Bug
> Components: rpc
> Affects Versions: 1.0.1
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
>
> When the Kudu RPC handlers are all busy, RPCs are rejected and it's up to the
> client to back-off and retry. This is usually a good idea, but is somewhat
> silly when the RPCs themselves are extremely lightweight. It can even be
> problematic when the RPC is responsible for updating liveness or detecting
> failures, as in cases like:
> - empty consensus updates which just need to update the Raft failure detector
> - the 'Ping' request that ksck uses to determine which tservers are online
> - the Scanner Keepalive call which just needs to keep a scanner open
> For these cases, it would be preferable to allow the RPC to be handled even
> if it would otherwise be rejected. For the consensus heartbeat example in
> particular, this handling would substantially reduce election storms when
> under high load.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)