BiteTheDDDDt opened a new pull request, #61988: URL: https://github.com/apache/doris/pull/61988
This pull request addresses the scenario where the number of local shuffle instances exceeds the number of physical buckets in bucketed OLAP table scans. The changes ensure that all instances are properly assigned data by introducing the concept of virtual buckets and expanding the bucket space accordingly. The most important changes are: **Bucket assignment logic improvements:** * In `UnassignedScanBucketOlapTableJob.java`, when the number of instances exceeds the number of physical buckets, virtual bucket indexes are assigned to the extra instances. This guarantees that every instance receives data during local shuffle, preventing idle instances. **Thrift parameter updates:** * In `ThriftPlansBuilder.java`, the `num_buckets` parameter is set to the maximum of the physical bucket count and the number of worker instances. This ensures that the hash distribution covers both physical and virtual buckets, so data can be routed to every instance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
