Re: [PR] [GLUTEN-5307][VL] Fix Potential Overflow Issue in VeloxShuffleWriter Due to Mismatched Data Types of RowNumber [incubator-gluten]


marin-ma commented on PR #5326:
URL: 
https://github.com/apache/incubator-gluten/pull/5326#issuecomment-2044118499


   @yangzhg Thanks for your detailed explanation. There are configurations in 
Gluten to explicitly limit the batch size within 32k 
https://github.com/marin-ma/gluten/blob/cf391fae10da22d73c451aced5cd05e474e3bccd/shims/common/src/main/scala/io/glutenproject/GlutenConfig.scala#L929-L946,
 (the configurations are passed to velox QueryContext here 
https://github.com/apache/incubator-gluten/blob/c60b5d90e0be23430672d732f9d574267b67c06e/cpp/velox/compute/WholeStageResultIterator.cc#L477-L480),
 and the prerequisites of doing that is we expect the batch size produced by 
Velox pipeline always <= 32k. If the input batch size doesn't exceed this 
limitation, there shouldn't be a problem to use uint16_t. 
   
   However, it seems like there are cases that the output from Velox pipeline 
can exceed the configured batch size. We are not sure if it's an expected case 
in Velox. Could you help to raise an issue in Velox to illustrate the case 
you've, that the Hashagg output can exceed the configured batch size? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [GLUTEN-5307][VL] Fix Potential Overflow Issue in VeloxShuffleWriter Due to Mismatched Data Types of RowNumber [incubator-gluten]

Reply via email to