gortiz commented on code in PR #18519:
URL: https://github.com/apache/pinot/pull/18519#discussion_r3274471122
##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/mailbox/GrpcSendingMailbox.java:
##########
@@ -208,17 +261,84 @@ public boolean isTerminated() {
return _senderSideClosed || _statusObserver.isFinished();
}
- private StreamObserver<MailboxContent> getContentObserver() {
+ private ClientCallStreamObserver<MailboxContent> getContentObserver() {
Metadata metadata = new Metadata();
metadata.put(ChannelUtils.MAILBOX_ID_METADATA_KEY, _id);
- return PinotMailboxGrpc.newStub(_channelManager.getChannel(_hostname,
_port))
+ // We wrap `_statusObserver` in a ClientResponseObserver so we can
register the on-ready handler through
+ // `beforeStart` — gRPC rejects setOnReadyHandler() if it is called after
open() returns. Wrapping (rather than
+ // making MailboxStatusObserver itself a ClientResponseObserver) keeps the
back-pressure plumbing local to this
+ // class. The wrapper delegates the data callbacks unchanged, and signals
our `_readyCond` on stream close so a
+ // blocked sender wakes up to observe `_statusObserver.isFinished()`
becoming true.
+ ClientResponseObserver<MailboxContent, MailboxStatus> responseObserver =
+ new ClientResponseObserver<MailboxContent, MailboxStatus>() {
+ @Override
+ public void beforeStart(ClientCallStreamObserver<MailboxContent>
requestStream) {
+ // Fires on a gRPC channel/Netty thread whenever isReady()
transitions false -> true. Just signal; the
+ // sender re-checks the predicate after waking.
+
requestStream.setOnReadyHandler(GrpcSendingMailbox.this::wakeWaiters);
+ }
+
+ @Override
+ public void onNext(MailboxStatus value) {
+ _statusObserver.onNext(value);
+ // Only wake on receiver early-terminate. Transport-level
isReady() transitions reach a parked
+ // sender through setOnReadyHandler (registered in beforeStart
above); normal buffer-size ACKs
+ // do not change any predicate awaitReady() actually waits on, so
signalling them would force a
+ // spurious park/unpark cycle on every receiver ACK.
Early-terminate is the one status-only
+ // change (the stream stays open) that awaitReady() must observe
promptly, so we still signal
+ // here when its metadata is set.
+ if (Boolean.parseBoolean(
+
value.getMetadataMap().get(ChannelUtils.MAILBOX_METADATA_REQUEST_EARLY_TERMINATE)))
{
+ wakeWaiters();
+ }
+ }
+
+ @Override
+ public void onError(Throwable t) {
+ try {
+ _statusObserver.onError(t);
+ } finally {
+ wakeWaiters();
+ }
+ }
+
+ @Override
+ public void onCompleted() {
+ try {
+ _statusObserver.onCompleted();
+ } finally {
+ wakeWaiters();
+ }
+ }
+ };
+
+ return (ClientCallStreamObserver<MailboxContent>) PinotMailboxGrpc.newStub(
+ _channelManager.getChannel(_hostname, _port))
.withInterceptors(MetadataUtils.newAttachHeadersInterceptor(metadata))
.withDeadlineAfter(_deadlineMs - System.currentTimeMillis(),
TimeUnit.MILLISECONDS)
- .open(_statusObserver);
+ .open(responseObserver);
}
protected void sendContent(ByteString byteString, boolean waitForMore) {
+ sendContent(byteString, waitForMore, false);
+ }
+
+ protected void sendContent(ByteString byteString, boolean waitForMore,
boolean bypassReady) {
+ if (!awaitReady(bypassReady)) {
+ // Either the mailbox was cancelled while we were waiting (normal path)
or the gRPC stream is already dead
+ // (bypass path). Either way, skip the send.
+ return;
+ }
+ // Narrow-window race mitigation: a concurrent cancel() may have run
between awaitReady() returning true and
+ // here, setting _senderSideClosed and pushing its own error EOS. If we
proceed, both threads would call
+ // onNext() on the same non-thread-safe ClientCallStreamObserver.
Re-checking after the gate reduces (but
+ // does not fully eliminate) that window; fully eliminating it would
require serializing all onNext() calls
+ // under _readyLock, which is more invasive. The bypass path
(cancel/close) must push through regardless,
+ // so this guard only applies when bypassReady == false.
+ if (!bypassReady && isTerminated()) {
Review Comment:
You're right — the narrow re-check on its own wasn't enough, and this was
independently flagged by @yashmayya in #3269229790. The full fix landed in
commit `c39e12f1a4` ("Fix data race on _contentObserver by serializing all
outbound calls under `_readyLock`"), after the diff this thread was posted
against.
What's there now: `_readyLock` (the existing `ReentrantLock` that already
guards `_readyCond`) serializes **every** call to `_contentObserver.onNext` /
`onCompleted`, across all three call sites:
* `sendContent` (`GrpcSendingMailbox.java:372`) — lock spans the
post-`awaitReady` `isTerminated()` re-check and the `onNext`.
* `send(MseBlock.Eos, …)` (`:142`) — lock spans `_senderSideClosed = true` +
`onCompleted` so the success-path close is atomic against any racing
`sendContent`.
* `cancel(Throwable)` (`:268`) — lock spans `processAndSend(errorBlock,
bypassReady=true)` + `onCompleted`. The inner `sendContent` reacquires the same
`ReentrantLock` via same-thread reentry.
`awaitReady` itself stays outside the lock — its slow path acquires
`_readyLock` internally to call `_readyCond.await`, and the fast
`isReady()`-true path remains lock-free. The fast `lock()/unlock()` pair we pay
per send is ~tens of ns vs. `onNext` cost, no measurable bench impact.
The "narrow-window mitigation" comment you were referring to has been
removed; class-level Javadoc now carries the new invariant:
```java
// _readyLock serializes every call to _contentObserver.onNext / onCompleted.
// Acquire it around any new outbound call site you add.
```
Tests in `GrpcSendingMailboxTest` (including `testRemoteCancelledBySender*`,
which specifically exercises the cancel-vs-send race) still pass.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]