[
https://issues.apache.org/jira/browse/SOLR-18244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18091253#comment-18091253
]
ASF subversion and git services commented on SOLR-18244:
--------------------------------------------------------
Commit bfd808924e098b8f7340969297b5cfe1ab45ccfb in solr's branch
refs/heads/branch_10x from Mark Robert Miller
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=bfd808924e0 ]
SOLR-18244: Fix several concurrency bugs in ParallelHttpShardHandler (#4548)
> Fix concurrency bugs in HttpShardHandler / ParallelHttpShardHandler that can
> hang searches or crash coordinator
> ---------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-18244
> URL: https://issues.apache.org/jira/browse/SOLR-18244
> Project: Solr
> Issue Type: Bug
> Reporter: Mark Robert Miller
> Priority: Major
> Labels: pull-request-available
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Fix several concurrency bugs in ParallelHttpShardHandler
> - Catch RejectedExecutionException from commExecutor and route through
> recordShardSubmitError so saturation/shutdown does not propagate
> synchronously and abandon already-submitted shard requests; this
> preserves shards.tolerant semantics and returns a 503 instead of 500.
> - Synchronize submitFutures registration and removal on the same monitor
> HttpShardHandler uses for responseFutureMap so cancelAll observes a
> consistent view of both maps.
> - Skip work in the runAsync runnable when the outer future has already
> been cancelled, avoiding a wasted lbClient.requestAsync call.
> - In HttpShardHandler.takeCompletedIncludingErrors, replace the blocking
> responses.take() with a 50ms poll so subclasses that gate
> responsesPending() on an async tracker (e.g. submitFutures) cannot
> cause a lost-wakeup hang when the tracker drains between the pending
> check and the queue wait.
> - Wrap the inner whenComplete body in a try/catch so a thrown
> transformResponse (or other failure inside the lambda) still enqueues
> a response and unblocks take() instead of stranding the consumer.
> - Expose cancellationLock() and isCanceled() on HttpShardHandler so
> subclasses can synchronize on the same monitor that already guards
> the canceled flag, responseFutureMap, and the responses queue.
> - Synchronize ParallelHttpShardHandler.responsesPending() on the
> cancellation monitor. Without the lock, take()'s loop performs three
> unsynchronized isEmpty() reads (responseFutureMap, responses,
> submitFutures) and can transiently observe all three as empty even
> though super.makeShardRequest's put and outer.whenComplete's
> submitFutures.remove are causally ordered. The intermediate state is
> consistent with each individual read but never with the cross-map
> invariant, so take() exits with null and the pending response (which
> arrives moments later via the inner whenComplete) is silently dropped
> by SearchHandler. Acquiring the lock serializes the three reads with
> every state-mutating operation.
> - Stop using submitFutures.isEmpty() as the loop guard in
> responsesPending(); use an exact AtomicInteger (inFlightSubmits)
> instead. ConcurrentHashMap.size()/isEmpty() are documented estimates:
> sumCount() = baseCount + sum(counterCells), and under contended
> put/remove from many threads the per-cell deltas can settle at a
> non-zero value while the table is physically empty. When that
> happens, isEmpty() returns false on a logically empty map,
> responsesPending() stays true, and the parent thread parks in
> responses.take() forever. AtomicInteger.get() is exact under any
> concurrency. The counter is incremented before runAsync, decremented
> unconditionally in the runAsync future's whenComplete finally, and
> also decremented in the RejectedExecutionException catch (no future)
> and the canceled-before-track early return (no whenComplete). The
> submitFutures map is retained only as the iteration target for
> cancelAll.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]