gortiz commented on PR #18649:
URL: https://github.com/apache/pinot/pull/18649#issuecomment-4624174254

   The polling chain this PR introduces is functional but has a cost-multiplier 
property that becomes significant at scale. Flagging it here as a design 
discussion point rather than a blocker, since fixing it is a larger change that 
can be done in a follow-up.
   
   **The problem with polling**
   
   When a client calls `GET /clientQuery/{id}/progress`, the chain is:
   
   ```
   client → controller (fan-out to all brokers)
          → broker     (fan-out HTTP to all servers, SSE; or gRPC Progress RPC, 
MSE)
          → servers    (Guava cache lookup, return JSON)
   ```
   
   For a 30-second query with a 1-second poll interval and 3 servers:
   
   ```
   30 client ticks
     → 30 × N controller→broker calls (N = brokers; fan-out to find the right 
one)
     → 30 × 3 broker→server calls
     = 90+ network calls whose only content is a tiny JSON payload
   ```
   
   At 100 concurrent queries: ~9,000 extra calls/minute. Each tick also 
requires the server to have live progress state accessible at any time (the 
Guava caches in `OpChainSchedulerService` and `InstanceRequestHandler`). The 
caches are sized and timed to stay alive long enough to answer the next poll — 
which introduces the eviction races noted in other comments.
   
   **A push alternative**
   
   The natural fix is to flip the direction: client opens one persistent 
connection, broker pushes events as they arrive.
   
   ```
   Client                  Broker                       Servers
     │                       │                             │
     │── GET /query/X/stream ►│                             │
     │  (SSE, stays open)    │                             │
     │                       │◄─ OpChainComplete (gRPC) ───│  ← already flowing 
via #18458
     │◄── data: {rows:12M} ──│                             │
     │                       │◄─ OpChainComplete (gRPC) ───│
     │◄── data: {rows:34M} ──│                             │
     │                       │◄─ OpChainComplete (gRPC) ───│
     │◄── data: {rows:100M} ─│                             │
     │◄── event: complete ───│  (stream closes)            │
   ```
   
   Cost for the same 30-second query:
   
   ```
   1 client connection (open once, reused throughout)
   0 new controller→broker calls
   0 new broker→server calls  ← #18458's SubmitWithStream already delivers this 
data
   ```
   
   The broker SSE endpoint just fans out events it already holds in 
`StreamingQuerySession`. No additional Guava caches on servers. No eviction 
races. No controller fan-out per tick.
   
   **Why this is feasible with #18458 in place**
   
   #18458 introduces a long-lived gRPC bidi channel (`SubmitWithStream`) 
between broker and servers that stays open for the query duration. The broker's 
`StreamingQuerySession` already accumulates per-op-chain stats as they 
complete. The missing piece is an outbound channel from broker to client. SSE 
provides exactly that with standard JAX-RS (`SseEventSink`):
   
   ```java
   // In StreamingQuerySession, when OpChainComplete arrives (already called by 
#18458):
   public void onOpChainComplete(...) {
       mergeStats(...);                     // existing #18458 logic
       broadcastProgressSnapshot();         // new: push to SSE subscribers
   }
   
   // New broker endpoint:
   @GET @Path("query/{id}/progress/stream") @Produces(SERVER_SENT_EVENTS)
   public void streamProgress(@PathParam("id") long queryId, @Context 
SseEventSink sink) {
       _queryDispatcher.subscribeProgressStream(queryId, sink);
   }
   ```
   
   Connection cleanup is handled automatically: when the SSE connection drops, 
`sink.isClosed()` returns true and the subscriber is removed on the next push 
attempt. When the query completes, the broker sends a final event with 
`complete: true` and closes the stream.
   
   **What this PR should do now**
   
   This is a non-trivial change that I wouldn't block the current PR on. But it 
would be worth:
   
   1. Keeping the polling endpoint as-is (it's correct and useful for 
languages/clients that can't hold persistent connections)
   2. Adding a `GET /query/{id}/progress/stream` SSE endpoint alongside it in a 
follow-up
   3. Having the CLI and Query Console prefer SSE when available
   
   The main thing to avoid is designing the server-side state (Guava caches, 
eviction timings) in a way that makes it hard to remove when the push path 
lands. The ref-counted `_executionContextByRequest` from #18458 is already the 
right shape for that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to