yashmayya opened a new pull request, #18620:
URL: https://github.com/apache/pinot/pull/18620

   ## Summary
   
   The per-segment group-by scan loop in 
`DefaultGroupByExecutor#process(ValueBlock)` had no resource-usage sampling or 
termination check. As a result, a heavy `GROUP BY` could scan hundreds of 
millions of rows and grow a large group-by hash table while:
   
   - its memory footprint was not freshly attributed to the query accountant 
(sampling only resumed later, when the `DataTable` was built), so per-query 
memory tracking lagged behind actual allocation, and
   - it could not respond to cancellation / query timeout mid-scan, since the 
only termination check on this path fired once before the loop started.
   
   This change adds a single per-block 
`QueryThreadContext.checkTerminationAndSampleUsage(...)` call at the start of 
`process()`. It samples usage and checks for termination once per block 
(`MAX_DOC_PER_CALL` rows), which:
   
   - keeps the query's tracked memory footprint fresh as the hash table grows 
across the scan, improving accounting accuracy for the OOM-protection 
framework, and
   - lets a cancelled or timed-out query bail out of a long-running aggregation 
instead of running to completion.
   
   The call sits in `process()`, so it covers `GroupByOperator`, 
`FilteredGroupByOperator`, and `StarTreeGroupByExecutor` (which inherits 
`process()`). It is invoked once per block (not per row), so the overhead is 
negligible and matches the existing periodic-sampling pattern used elsewhere on 
the query path.
   
   ## Testing
   
   New `DefaultGroupByExecutorTest` builds a real multi-block segment and 
verifies that `process()`:
   
   - samples usage exactly once per block,
   - throws `TerminationException` when the query is explicitly cancelled, and
   - throws an `EXECUTION_TIMEOUT` `QueryException` when the query deadline has 
passed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to