BiteTheDDDDt opened a new pull request, #61988:
URL: https://github.com/apache/doris/pull/61988

   This pull request addresses the scenario where the number of local shuffle 
instances exceeds the number of physical buckets in bucketed OLAP table scans. 
The changes ensure that all instances are properly assigned data by introducing 
the concept of virtual buckets and expanding the bucket space accordingly. The 
most important changes are:
   
   **Bucket assignment logic improvements:**
   
   * In `UnassignedScanBucketOlapTableJob.java`, when the number of instances 
exceeds the number of physical buckets, virtual bucket indexes are assigned to 
the extra instances. This guarantees that every instance receives data during 
local shuffle, preventing idle instances.
   
   **Thrift parameter updates:**
   
   * In `ThriftPlansBuilder.java`, the `num_buckets` parameter is set to the 
maximum of the physical bucket count and the number of worker instances. This 
ensures that the hash distribution covers both physical and virtual buckets, so 
data can be routed to every instance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to