caiconghui opened a new issue, #16038: URL: https://github.com/apache/doris/issues/16038
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues. ### Description we create a table with 50 buckets on doris cluster with 50 be nodes, and do query benchmark, just open 50 connections, and execute query just like "select count(1) from test", and we found that cpu usage is unbalanced,while high cost node is nearly 100 but low cost node is just 30%,and some nodes don't have fragment to execute, so we need to find a better solution to solve this problem. Once the location of the replica is determined, the query scan node scheduling problem can be described as following: Minimum cost and maximum flow <img width="1237" alt="image" src="https://user-images.githubusercontent.com/55968745/213057900-aa14d395-bd99-4293-90f2-17a267690374.png"> ### Solution to find to most evenly distributed solution for one query,the cost is very high,so we add a new strategy to giving high Priority to nodes with few replicas for scan node scheduling ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
