Re: [I] [VL] The time taken to merge payload during the shuffle write is excessively high [incubator-gluten]

via GitHub Tue, 08 Jul 2025 23:34:28 -0700


FelixYBW commented on issue #10104:
URL: 
https://github.com/apache/incubator-gluten/issues/10104#issuecomment-3051321950


   @NEUpanning Is your data in the pattern that only 1 or a few reducer 
partitions are filled during the split?
   
   Here we allocate the destination row vector size by available memory/reducer 
numbers, the assumption is that the data is evenly filled into destination 
partitions. But if the data is somehow sorted or skewed, the performance will 
be bad.
   
   The solution is to allocate the vector size by the more precised reducer 
number.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [VL] The time taken to merge payload during the shuffle write is excessively high [incubator-gluten]

Reply via email to