gortiz commented on code in PR #18519:
URL: https://github.com/apache/pinot/pull/18519#discussion_r3274471122


##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/mailbox/GrpcSendingMailbox.java:
##########
@@ -208,17 +261,84 @@ public boolean isTerminated() {
     return _senderSideClosed || _statusObserver.isFinished();
   }
 
-  private StreamObserver<MailboxContent> getContentObserver() {
+  private ClientCallStreamObserver<MailboxContent> getContentObserver() {
     Metadata metadata = new Metadata();
     metadata.put(ChannelUtils.MAILBOX_ID_METADATA_KEY, _id);
 
-    return PinotMailboxGrpc.newStub(_channelManager.getChannel(_hostname, 
_port))
+    // We wrap `_statusObserver` in a ClientResponseObserver so we can 
register the on-ready handler through
+    // `beforeStart` — gRPC rejects setOnReadyHandler() if it is called after 
open() returns. Wrapping (rather than
+    // making MailboxStatusObserver itself a ClientResponseObserver) keeps the 
back-pressure plumbing local to this
+    // class. The wrapper delegates the data callbacks unchanged, and signals 
our `_readyCond` on stream close so a
+    // blocked sender wakes up to observe `_statusObserver.isFinished()` 
becoming true.
+    ClientResponseObserver<MailboxContent, MailboxStatus> responseObserver =
+        new ClientResponseObserver<MailboxContent, MailboxStatus>() {
+          @Override
+          public void beforeStart(ClientCallStreamObserver<MailboxContent> 
requestStream) {
+            // Fires on a gRPC channel/Netty thread whenever isReady() 
transitions false -> true. Just signal; the
+            // sender re-checks the predicate after waking.
+            
requestStream.setOnReadyHandler(GrpcSendingMailbox.this::wakeWaiters);
+          }
+
+          @Override
+          public void onNext(MailboxStatus value) {
+            _statusObserver.onNext(value);
+            // Only wake on receiver early-terminate. Transport-level 
isReady() transitions reach a parked
+            // sender through setOnReadyHandler (registered in beforeStart 
above); normal buffer-size ACKs
+            // do not change any predicate awaitReady() actually waits on, so 
signalling them would force a
+            // spurious park/unpark cycle on every receiver ACK. 
Early-terminate is the one status-only
+            // change (the stream stays open) that awaitReady() must observe 
promptly, so we still signal
+            // here when its metadata is set.
+            if (Boolean.parseBoolean(
+                
value.getMetadataMap().get(ChannelUtils.MAILBOX_METADATA_REQUEST_EARLY_TERMINATE)))
 {
+              wakeWaiters();
+            }
+          }
+
+          @Override
+          public void onError(Throwable t) {
+            try {
+              _statusObserver.onError(t);
+            } finally {
+              wakeWaiters();
+            }
+          }
+
+          @Override
+          public void onCompleted() {
+            try {
+              _statusObserver.onCompleted();
+            } finally {
+              wakeWaiters();
+            }
+          }
+        };
+
+    return (ClientCallStreamObserver<MailboxContent>) PinotMailboxGrpc.newStub(
+            _channelManager.getChannel(_hostname, _port))
         .withInterceptors(MetadataUtils.newAttachHeadersInterceptor(metadata))
         .withDeadlineAfter(_deadlineMs - System.currentTimeMillis(), 
TimeUnit.MILLISECONDS)
-        .open(_statusObserver);
+        .open(responseObserver);
   }
 
   protected void sendContent(ByteString byteString, boolean waitForMore) {
+    sendContent(byteString, waitForMore, false);
+  }
+
+  protected void sendContent(ByteString byteString, boolean waitForMore, 
boolean bypassReady) {
+    if (!awaitReady(bypassReady)) {
+      // Either the mailbox was cancelled while we were waiting (normal path) 
or the gRPC stream is already dead
+      // (bypass path). Either way, skip the send.
+      return;
+    }
+    // Narrow-window race mitigation: a concurrent cancel() may have run 
between awaitReady() returning true and
+    // here, setting _senderSideClosed and pushing its own error EOS. If we 
proceed, both threads would call
+    // onNext() on the same non-thread-safe ClientCallStreamObserver. 
Re-checking after the gate reduces (but
+    // does not fully eliminate) that window; fully eliminating it would 
require serializing all onNext() calls
+    // under _readyLock, which is more invasive. The bypass path 
(cancel/close) must push through regardless,
+    // so this guard only applies when bypassReady == false.
+    if (!bypassReady && isTerminated()) {

Review Comment:
   You're right — the narrow re-check on its own wasn't enough, and this was 
independently flagged by @yashmayya in #3269229790. The full fix landed in 
commit `c39e12f1a4` ("Fix data race on _contentObserver by serializing all 
outbound calls under `_readyLock`"), after the diff this thread was posted 
against.
   
   What's there now: `_readyLock` (the existing `ReentrantLock` that already 
guards `_readyCond`) serializes **every** call to `_contentObserver.onNext` / 
`onCompleted`, across all three call sites:
   
   * `sendContent` (`GrpcSendingMailbox.java:372`) — lock spans the 
post-`awaitReady` `isTerminated()` re-check and the `onNext`.
   * `send(MseBlock.Eos, …)` (`:142`) — lock spans `_senderSideClosed = true` + 
`onCompleted` so the success-path close is atomic against any racing 
`sendContent`.
   * `cancel(Throwable)` (`:268`) — lock spans `processAndSend(errorBlock, 
bypassReady=true)` + `onCompleted`. The inner `sendContent` reacquires the same 
`ReentrantLock` via same-thread reentry.
   
   `awaitReady` itself stays outside the lock — its slow path acquires 
`_readyLock` internally to call `_readyCond.await`, and the fast 
`isReady()`-true path remains lock-free. The fast `lock()/unlock()` pair we pay 
per send is ~tens of ns vs. `onNext` cost, no measurable bench impact.
   
   The "narrow-window mitigation" comment you were referring to has been 
removed; class-level Javadoc now carries the new invariant:
   
   ```java
   // _readyLock serializes every call to _contentObserver.onNext / onCompleted.
   // Acquire it around any new outbound call site you add.
   ```
   
   Tests in `GrpcSendingMailboxTest` (including `testRemoteCancelledBySender*`, 
which specifically exercises the cancel-vs-send race) still pass.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to