[Java] [Flight] Questions around DoGet implementations with flow control

Nathaniel Jones Fri, 10 Mar 2023 21:28:18 -0800

Hello,

I'm hoping to check my understanding around various ways to implement a
DoGet handler with respect to flow control, and then inquire about
potential future API changes.

First, I'm aware of 3 ways to respect flow control when implementing a Java
server's DoGet that have different characteristics:

1. Busy-Waiting / Thread.sleep()-ing:
1. Implement a blocking body that loops (and maybe sleeps) while the
ServerStreamListener's isReady is false (and respect isCancelled, too)
2. Using BackpressureStrategy

<https://sourcegraph.com/github.com/apache/arrow/-/blob/java/flight/flight-core/src/main/java/org/apache/arrow/flight/BackpressureStrategy.java?L28:8>
(and
specifically CallbackBackpressureStrategy

<https://sourcegraph.com/github.com/apache/arrow/-/blob/java/flight/flight-core/src/main/java/org/apache/arrow/flight/BackpressureStrategy.java?L75>
since
one could implement option #1 above as a simple strategy):
1. My own experiments with CallbackBackpressureStrategy /
understanding from initial PR discussion
<https://github.com/apache/arrow/pull/8476#issuecomment-710777211>
demonstrate
that the DoGet handler must be run on a separate thread; you can't invoke
the "waitForListener" on the thread that gRPC uses to invoke this RPC
because if you're blocking (in this case Thread await-ing) on this gRPC
thread, gRPC can't process onReady callbacks for this RPC, and thus
CallbackBackpressureStrategy would never be notify-ed to wake up
3. Write a fully async implementation relying directly on underlying
CallStreamObserver's setOnReadyHandler:
1. This is similar in spirit to #2 above, but now operates completely
on threads from gRPC's thread pool (the onReady handler *is* the
DoGet logic). The code looks very roughly like:
1. make VectorSchemaRoot with some schema and allocator
2. On our ServerStreamListener, invoke listener.start(root)
immediately
3. set listener.setOnReadyHandler(<the DoGet logic to stream data
to client>)
4. So, no blocking

My understanding of the trade-offs between options #2 and #3 above:

- In CallbackBackpressureStrategy with background threads, there'll be
(# threads in gRPC pool) + (sum of background threads across currently
running DoGet streams) threads. If the actual streaming logic is intensive
/ CPU bound, it might be good for that to live on a background thread,
because if the gRPC threads were tied up in intensive callbacks (and even
exhausted), new RPC requests / ready and cancel callbacks could be slow /
stuck.
- On the other hand, for quick I/O bound work in DoGet logic, option #3
might work well if it's not worth the extra threads / context switching,
and gRPC threads can handle everything quickly.
- So overall, it seems like different workloads could demand a different
model, which should inform how DoGet logic is written.

*Q1: *Is my understanding of the above points approximately correct? I'd
appreciate any pointers on items I'm misunderstanding.

*Q2: *Assuming option #3 has utility in some cases, I'm curious if there is
a way in Flight to expose an easier API to implement DoGet fully
asynchronously. I find it tricky to manage cleanup of the VectorSchemaRoot
as well as being careful about things like respecting isCancelled becoming
true. Developers also "need to know" that the async setOnReadyHandler
exists at the gRPC layer and understand its benefits. I noticed that on
ARROW-4484 <https://issues.apache.org/jira/browse/ARROW-4484>, which
focuses on DoPut's busy-wait, the first comment mentions that this applies
to DoGet as well - though it seems GetListener's waitUntilStreamReady
<https://sourcegraph.com/github.com/apache/arrow/-/blob/java/flight/flight-core/src/main/java/org/apache/arrow/flight/FlightService.java?L189-192>
doesn't
busy-wait, I was curious if the spirit of the comment was similar to my
question around an easier way to write the logic asynchronously? If so,
would the idea be for Flight to wrap gRPC's callbacks and expose those to
DoGet authors, while helping to abstract some cleanup items?

Thank you for any feedback,
Nate

[Java] [Flight] Questions around DoGet implementations with flow control

Reply via email to