[I] [Enhancement] Support per query TaskScheduler [doris]

via GitHub Wed, 17 Dec 2025 02:48:54 -0800


yiguolei opened a new issue, #59125:
URL: https://github.com/apache/doris/issues/59125


   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Description
   
   Current execution method: A thread pool is configured on the BE (Backend), 
with its size set to the number of CPU cores. All queries are executed in this 
single pool, and no query is allowed to block during execution. Once a block 
occurs, the thread is occupied (similar to coroutine behavior), preventing 
other queries from running. Therefore, a second type of pool—an IO thread 
pool—is required, dedicated to executing blocking operations, and this pool is 
configured to be large in size.
   
   In theory, this execution model is elegant as it can reduce CPU overhead 
caused by thread switching in multi-threaded models. However, it has the 
following issues:
   
   -  High programming complexity: For example, spill-to-disk operations cannot 
be performed during task execution; lazy materialization also involves network 
IO, which requires code modifications.
   - High debugging difficulty: It is impossible to use tools like pstack to 
identify which query is blocked, necessitating the development of numerous 
custom debugging tools.
   - Difficult query isolation: Users sometimes want to limit the resource 
usage of a single query, but this is not feasible under the current model.
   
   One feasible solution can be considered, similar to ClickHouse's approach:
   - Launch an extremely large thread pool on the BE (e.g., 100 * CPU cores).
   - When each query is executed, it requests a small thread pool from this 
large pool, where the size of the small thread pool equals the number of CPU 
cores.
   - The current execution method is retained within each small pool.
   
   This way, during query execution, we can set the number of threads via 
session variables to limit CPU utilization. Each query no longer needs to be 
refactored to be asynchronous—if IO operations are involved, they can be 
written directly in the existing code, and any blocking will only affect the 
current query itself.
   
   ### Solution
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [Enhancement] Support per query TaskScheduler [doris]

Reply via email to