gus-asf commented on code in PR #2666:
URL: https://github.com/apache/solr/pull/2666#discussion_r1778007524
##########
solr/core/src/java/org/apache/solr/handler/component/ParallelHttpShardHandler.java:
##########
@@ -41,57 +39,78 @@
@NotThreadSafe
public class ParallelHttpShardHandler extends HttpShardHandler {
+ @SuppressWarnings("unused")
private static final Logger log =
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
private final ExecutorService commExecutor;
+ /*
+ * Unlike the basic HttpShardHandler, this class allows us to exit submit
before
+ * pending is incremented and the responseFutureMap is updated. If the
runnables that
Review Comment:
That quote sounds scary because the code is complex and hard to describe. I
am totally in agreement that we could refactor things further. (it was actually
just refactored recently to use CompleteableFutures). However, I'm trying very
hard to resist the (strong) temptation to completely redesign everything since
I'm hoping a version of this can go in to 9x.
Here's a quick(?!?!) attempt to summarize (and possible first seed material
of a blog post :) ):
The SearchHandler currently has 2 modes that it alternates until deciding it
is done... First it enters a while loop that I will call [sending
mode](https://github.com/apache/solr/blob/d9d2af471610c66d56b43981215afec8e41ff65b/solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java#L517C11-L552C1)
where it sends every outgoing request it knows about using
`shardHandler.submit(sreq, shard, params);`. Then it enters a while loop that I
will call [receiving mode
](https://github.com/apache/solr/blob/d9d2af471610c66d56b43981215afec8e41ff65b/solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java#L557C1-L577C14)
where it uses `ShardResponse srsp = tolerant ?
shardHandler.takeCompletedIncludingErrors() :
shardHandler.takeCompletedOrError()` to drain the responses that it has
received unless something new shows up in outgoing.
These two modes are contained in a while loop that is checking that there
are no more outgoing requests. So the code reverts to send mode any time
there's something to send, and exits if it finishes receiving with nothing
further to send. This outer loop is needed because the receive process may
queue new requests, but if we are done receiving no more requests can be
queued, so at that point if nothing needs sending it's safe to continue.
In the classic single threaded HttpShardHandler, we can use a single atomic
boolean incremented every time we send something and decremented when we
receive something to keep track of how many things it was asked to send, and if
we've received them all.
In the new ParallelHttpShardHandler the send loop only schedules the send
(to be performed by a thread in an executor before passing the thread, and the
send mode loop will exit without any guarantee that even one http request is in
flight yet. The Runnable that is passed to the executor calls
`this.lbClient.requestAsync(lbReq);` and that method calls out into the actual
http communication libraries which might be jetty http, jdk http, or whatever
the latest fad for handling http is in the future. As such it doesn't seem wise
to trust that it completes without exception. The callback is added to the
future on the next line, but we can't be sure that that will happen.
Key take away: With parallel shard submit, we have two levels of tracking,
one for initial intentions to make a request (at the SearchHandler level) and
one for "has the Runnable been processed at all (at the shard handler level).
This is important because [the `take(boolean)` based
methods](https://github.com/apache/solr/blob/d9d2af471610c66d56b43981215afec8e41ff65b/solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java#L310C1-L318C4)
have a loop that depends on obtaining a response for every thing we attempted.
If we don't **track** what we attempt and **track** that the *attempt was made*
(regardless of success/fail) then we run the risk of exiting that loop
**before** we have ***started*** any of our http requests (in the executor
threads which could delay/pause any amount of time)
So the sentence you quote about "needing to know everything we scheduled has
been attempted", is referring specifically the new ParallelHttpShardHandler
class question of whether or not the Runnable has been attempted by the
Executor. So it's about _request initiation_ (remeber it's calling
run**Async**()!), and has nothing to do with request completion.
Not discussed for simplicity:
- tests of responses received vs the shard count that go hay wire the
pending request count never exceeds the shard count.
- fake responses created to satisfy the shard count test when shards are
down.
- empty strings as shard urls that inflate the shard count when shards are
down necessitating the fake response.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]