Re: [PR] [VL] Provide options to combine small batches before sending to shuffle [incubator-gluten]

via GitHub Mon, 10 Jun 2024 21:11:58 -0700


FelixYBW commented on PR #6009:
URL: 
https://github.com/apache/incubator-gluten/pull/6009#issuecomment-2159745003


   > around shuffle split processing. We may want to figure it out later to 
avoid doing such batch coalesce operations that intro
   
   It's because the initialization of current split function. Currently we use 
3 loops (per column, per reducer, per row) to do the split, if the column data 
is cached then the solution is the best way to scale to reducer numbers. 
However to achieve this, we need much initialization work to create several 
vectors. If the input batch is small, we will suffer from the initialization 
overhead. Even bigger than the copy to bigger batches.
   
   Another issue is if the data size is too large and exceeds the cache size, 
then performance will be very poor.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [VL] Provide options to combine small batches before sending to shuffle [incubator-gluten]

Reply via email to