WangGuangxin opened a new pull request, #9035:
URL: https://github.com/apache/incubator-gluten/pull/9035

   ## What changes were proposed in this pull request?
   
   We found that ColumnarPartialProject is very batch sensitive, that for small 
input batches, the performance may more slower thant total fallback. But if we 
increase the batch size, there may have significate performance improvements.
   
   One example in our case
   Before append `VeloxResizeBatches` (avg batch size is 
26,341,825,212/2,775,903,310 = 9 row)
   <img width="870" alt="image" 
src="https://github.com/user-attachments/assets/95cca983-17ef-4c7d-bfc6-e4499d0aa925";
 />
   
   After append `VeloxResizeBatches` (avg batch size is 
26,341,825,212/25,089,811 = 1049 row)
   <img width="683" alt="image" 
src="https://github.com/user-attachments/assets/06cbb146-45bb-499f-b97b-56f8f1c64743";
 />
   
   Though there are cost to combine batches, but the whole stage's task time 
reduced from 2306h to 1029h
   
   (Fixes: \#9034)
   
   ## How was this patch tested?
   
   manually
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org

Reply via email to