> > I'm an Impala dev working on replacing Thrift with krpc. One issue that
> > recently came up is that we would like to have a simple way of simulating
> > different types of failures of rpcs for testing purposes, and I was
> > wondering if krpc already has anything like this built in, or if there's
> > any interest in such a feature being implemented.
> >
> > In the past with Thrift, Impala did this by overriding automatically
> > generated rpc functions to add debugging calls. I have a patch out
> > currently to start doing this with the rpcs that we've ported to krpc so
> > far: https://gerrit.cloudera.org/#/c/12297/
> >
> > That patch would allow tests to be written that pass in options in the form
> > "${RPC_NAME}:${ERROR_TYPE}@ARGS....", for example "CANCEL_QUERY:[email protected]",
> > which would cause CancelQuery rpcs to fail with 50% probability.
> >
> > It was pointed out in the review that this could potentially be
> > accomplished more cleanly by modifying the code that generates the proxy
> > definitions, e.g. protoc-gen-krpc.cc. We could always just make those
> > modifications in the copy of krpc's code that is checked in to Impala, but
> > we'd like to minimize divergence, and of course its always nice to share
> > code/effort where possible.
> >
>
> I'm not against adding the ability to hook the proxy classes, so long as
> it's perf-neutral when not enabled. I would think you'd want the ability to
> fault an RPC both before it gets sent (so it is never delivered) and also
> to block the response (so the server does process it but the client doesn't
> realize). It looks like your patch did that. That would help suss out cases
> where you have retries without ensuring proper idempotency, etc.
>
> Another option would be to put the changes in the generic 'Proxy' class --
> or make it possible to pass your own Proxy subclass instance when
> constructing a generated proxy. I think that's cleaner than modifying the
> codegen with hooks.
>
> That said, I dont think we'd make much use of it on the Kudu side. We do
> have a few places we do fault injection like the above, but more often our
> fault injection works by starting multiple processes and actually
> controlling the forked daemons by signals or otherwise making it crash by
> remotely setting fault injection "crash_on_..." type flags. These kinds of
> faults are a bit more realistic since after a node crashes it will have to
> restart, go back to initial states, etc. It also ensures that we get
> correlated-in-time failures across all different RPCs headed for the host,
> which can trigger interesting behavior on clients who might have multiple
> outstanding requests to the crashed one.

What about simulating network partitions? Do you think we would use
what Thomas' describing for that? Or use something like nftables?

I can't tell if what's being proposed is client-only; if it is, then
it probably wouldn't be appropriate for network partitions.

Reply via email to