Todd Lipcon commented on KUDU-1707:

My thinking here is to do something like the following:
- For each call 'Foo' in an RPC interface, generate a new ServiceIf method like:

virtual bool FooAsync(request, response, context) {
  return false;

Upon receipt of an RPC, this method will be called. If it returns false, it 
continues to go through the normal path. If it returns true, then the RPC 
system assumes it has been fully handled, and doesn't queue it. The default 
implementation would just return false as shown above. The overhead of the 
extra virtual function call should be negligible.

The method would run on the reactor thread, and thus be subject to the normal 
reactor restrictions (no sleeps, no locks that might wait, no log messages 
under normal circumstances, etc). For the cases mentioned above, we could 
implement them as:

*Ping*: just respond to the RPC with a success message (so Pings always succeed 
even if the queue is full)
*Keepalive*: update the scanner and respond. (need to be careful that there 
aren't circumstances where the scanner lookup could block)
*Consensus*: snooze the failure detector for the tablet and then continue 
processing. A trylock might be necessary here since the consensus impl is 
fairly coarse-grain locked. Another alternative would be to check if the 
message is a status-only (heartbeat) message, and in that case enqueue it onto 
a special processing queue for these "fast" requests.

An alternative implementation might be to add a hook which is only called on 
rejection. But, I think it would be subject to the same restrictions (running 
on the reactor thread, etc) and be less useful in terms of giving quick 
responses to Pings, etc.

> Add hook to handle RPCs prior to queueing or rejection
> ------------------------------------------------------
>                 Key: KUDU-1707
>                 URL: https://issues.apache.org/jira/browse/KUDU-1707
>             Project: Kudu
>          Issue Type: Bug
>          Components: rpc
>    Affects Versions: 1.0.1
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
> When the Kudu RPC handlers are all busy, RPCs are rejected and it's up to the 
> client to back-off and retry. This is usually a good idea, but is somewhat 
> silly when the RPCs themselves are extremely lightweight. It can even be 
> problematic when the RPC is responsible for updating liveness or detecting 
> failures, as in cases like:
> - empty consensus updates which just need to update the Raft failure detector
> - the 'Ping' request that ksck uses to determine which tservers are online
> - the Scanner Keepalive call which just needs to keep a scanner open
> For these cases, it would be preferable to allow the RPC to be handled even 
> if it would otherwise be rejected. For the consensus heartbeat example in 
> particular, this handling would substantially reduce election storms when 
> under high load. 

This message was sent by Atlassian JIRA

Reply via email to