Re: Simulating failures in krpc

Todd Lipcon Fri, 01 Feb 2019 08:22:30 -0800

On Thu, Jan 31, 2019 at 3:11 PM Thomas Tauber-Marshall
<[email protected]> wrote:


> I'm an Impala dev working on replacing Thrift with krpc. One issue that
> recently came up is that we would like to have a simple way of simulating
> different types of failures of rpcs for testing purposes, and I was
> wondering if krpc already has anything like this built in, or if there's
> any interest in such a feature being implemented.
>
> In the past with Thrift, Impala did this by overriding automatically
> generated rpc functions to add debugging calls. I have a patch out
> currently to start doing this with the rpcs that we've ported to krpc so
> far: https://gerrit.cloudera.org/#/c/12297/
>
> That patch would allow tests to be written that pass in options in the form
> "${RPC_NAME}:${ERROR_TYPE}@ARGS....", for example "CANCEL_QUERY:[email protected]",
> which would cause CancelQuery rpcs to fail with 50% probability.
>
> It was pointed out in the review that this could potentially be
> accomplished more cleanly by modifying the code that generates the proxy
> definitions, e.g. protoc-gen-krpc.cc. We could always just make those
> modifications in the copy of krpc's code that is checked in to Impala, but
> we'd like to minimize divergence, and of course its always nice to share
> code/effort where possible.
>

I'm not against adding the ability to hook the proxy classes, so long as
it's perf-neutral when not enabled. I would think you'd want the ability to
fault an RPC both before it gets sent (so it is never delivered) and also
to block the response (so the server does process it but the client doesn't
realize). It looks like your patch did that. That would help suss out cases
where you have retries without ensuring proper idempotency, etc.

Another option would be to put the changes in the generic 'Proxy' class --
or make it possible to pass your own Proxy subclass instance when
constructing a generated proxy. I think that's cleaner than modifying the
codegen with hooks.

That said, I dont think we'd make much use of it on the Kudu side. We do
have a few places we do fault injection like the above, but more often our
fault injection works by starting multiple processes and actually
controlling the forked daemons by signals or otherwise making it crash by
remotely setting fault injection "crash_on_..." type flags. These kinds of
faults are a bit more realistic since after a node crashes it will have to
restart, go back to initial states, etc. It also ensures that we get
correlated-in-time failures across all different RPCs headed for the host,
which can trigger interesting behavior on clients who might have multiple
outstanding requests to the crashed one.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Simulating failures in krpc

Reply via email to