BiteTheDDDDt opened a new pull request, #61617:
URL: https://github.com/apache/doris/pull/61617

   This pull request introduces a shared, atomic row limit mechanism in the 
scanner context to ensure that concurrent scanners collectively respect the SQL 
`LIMIT` clause. The main changes implement a thread-safe, centrally managed 
quota for remaining rows, preventing over-scanning and efficiently coordinating 
concurrent scanner threads. Additionally, related logic is updated to stop or 
throttle scanners when the quota is exhausted and to provide improved debug 
information.
   
   **Shared limit management and enforcement:**
   
   * Added a new atomic member `_remaining_limit` to `ScannerContext`, 
representing the shared remaining row limit across all scanners, and 
initialized it appropriately. Provided an `acquire_limit_quota()` method for 
atomically claiming rows from this quota. 
[[1]](diffhunk://#diff-0c9a817d45d8130ea3211189e1321d1275e22fd4a9a3fac2bd707b1cfeefa5e5R74)
 
[[2]](diffhunk://#diff-0c9a817d45d8130ea3211189e1321d1275e22fd4a9a3fac2bd707b1cfeefa5e5R99)
 
[[3]](diffhunk://#diff-3049f42cade971254aae07ced700d9a10b2505b03da743efea3270e63bd88dceR224-R229)
   * Updated scanner scheduling logic to check and respect the shared limit 
before launching new scan tasks or continuing scanning, ensuring no new work is 
scheduled if the limit is exhausted. 
[[1]](diffhunk://#diff-0c9a817d45d8130ea3211189e1321d1275e22fd4a9a3fac2bd707b1cfeefa5e5R639-R643)
 
[[2]](diffhunk://#diff-ecdf52f3fb33b9018cc1aff92085e470087071b25d79efed8a849a289215d05fR235-R239)
   
   **Block quota enforcement and block truncation:**
   
   * In the scanner execution loop, after reading a block, scanners now 
atomically acquire quota for the number of rows in the block. If quota is 
exhausted, the block is discarded or truncated to the permitted row count, and 
scanning stops.
   
   **Completion and lifecycle management:**
   
   * Modified context completion logic to also mark the scan as finished if the 
shared limit is exhausted and no scanners are running, ensuring correct query 
termination.
   
   **Debugging and observability:**
   
   * Enhanced the `debug_string()` output of `ScannerContext` to include the 
current value of `remaining_limit`, aiding in diagnostics and monitoring.
   
   **Small-limit optimization:**
   
   * Retained and clarified the optimization for scanners with a small 
per-scanner limit, ensuring they return early to avoid unnecessary data 
scanning.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to