BiteTheDDDDt opened a new pull request, #61617: URL: https://github.com/apache/doris/pull/61617
This pull request introduces a shared, atomic row limit mechanism in the scanner context to ensure that concurrent scanners collectively respect the SQL `LIMIT` clause. The main changes implement a thread-safe, centrally managed quota for remaining rows, preventing over-scanning and efficiently coordinating concurrent scanner threads. Additionally, related logic is updated to stop or throttle scanners when the quota is exhausted and to provide improved debug information. **Shared limit management and enforcement:** * Added a new atomic member `_remaining_limit` to `ScannerContext`, representing the shared remaining row limit across all scanners, and initialized it appropriately. Provided an `acquire_limit_quota()` method for atomically claiming rows from this quota. [[1]](diffhunk://#diff-0c9a817d45d8130ea3211189e1321d1275e22fd4a9a3fac2bd707b1cfeefa5e5R74) [[2]](diffhunk://#diff-0c9a817d45d8130ea3211189e1321d1275e22fd4a9a3fac2bd707b1cfeefa5e5R99) [[3]](diffhunk://#diff-3049f42cade971254aae07ced700d9a10b2505b03da743efea3270e63bd88dceR224-R229) * Updated scanner scheduling logic to check and respect the shared limit before launching new scan tasks or continuing scanning, ensuring no new work is scheduled if the limit is exhausted. [[1]](diffhunk://#diff-0c9a817d45d8130ea3211189e1321d1275e22fd4a9a3fac2bd707b1cfeefa5e5R639-R643) [[2]](diffhunk://#diff-ecdf52f3fb33b9018cc1aff92085e470087071b25d79efed8a849a289215d05fR235-R239) **Block quota enforcement and block truncation:** * In the scanner execution loop, after reading a block, scanners now atomically acquire quota for the number of rows in the block. If quota is exhausted, the block is discarded or truncated to the permitted row count, and scanning stops. **Completion and lifecycle management:** * Modified context completion logic to also mark the scan as finished if the shared limit is exhausted and no scanners are running, ensuring correct query termination. **Debugging and observability:** * Enhanced the `debug_string()` output of `ScannerContext` to include the current value of `remaining_limit`, aiding in diagnostics and monitoring. **Small-limit optimization:** * Retained and clarified the optimization for scanners with a small per-scanner limit, ensuring they return early to avoid unnecessary data scanning. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
