NEUpanning opened a new issue, #10200:
URL: https://github.com/apache/incubator-gluten/issues/10200

   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   Currently, VeloxRssSortShuffleWriter calculates RowVector batch size via 
[RowVector::estimateFlatSize](https://github.com/apache/incubator-gluten/blob/main/cpp/velox/shuffle/VeloxRssSortShuffleWriter.cc#L54),
 but the string buffer shared by multiple vectors will be calculated multiple 
times instead of once. As a result, the size could be way smaller than the 
actual batch size and the buffer sent to Celeborn could have too many small 
batches leading to poor performance.
   
   I think we can calculate the shared string buffer only once to solve this 
issue.
   
   ### Gluten version
   
   Gluten-1.3
   
   ### Spark version
   
   Spark-3.5.x
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   ```bash
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to