Re: [PR] [VL] Optimize the performance of hash based shuffle by accumulating batches [incubator-gluten]

via GitHub Mon, 03 Jun 2024 00:16:55 -0700


FelixYBW commented on PR #5951:
URL: 
https://github.com/apache/incubator-gluten/pull/5951#issuecomment-2144449307


   Thank you for the improvement.
   
   The ideal case of current split function is that: the input batch size 
should be as large as possible but all columns can fit into L2 cache. Once the 
column data can't fit into L2 cache, split performance will drop dramatically. 
   
   On the other side if the batch size is too small, the overhead of preparing 
of split actually can't be ignored. We should have room to improve with this 
case.
   
   Also in current implementation, we must flatten the rowvector to split. The 
flatten itself also have overhead which is not analyzed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [VL] Optimize the performance of hash based shuffle by accumulating batches [incubator-gluten]

Reply via email to