marin-ma commented on PR #5326: URL: https://github.com/apache/incubator-gluten/pull/5326#issuecomment-2044118499
@yangzhg Thanks for your detailed explanation. There are configurations in Gluten to explicitly limit the batch size within 32k https://github.com/marin-ma/gluten/blob/cf391fae10da22d73c451aced5cd05e474e3bccd/shims/common/src/main/scala/io/glutenproject/GlutenConfig.scala#L929-L946, (the configurations are passed to velox QueryContext here https://github.com/apache/incubator-gluten/blob/c60b5d90e0be23430672d732f9d574267b67c06e/cpp/velox/compute/WholeStageResultIterator.cc#L477-L480), and the prerequisites of doing that is we expect the batch size produced by Velox pipeline always <= 32k. If the input batch size doesn't exceed this limitation, there shouldn't be a problem to use uint16_t. However, it seems like there are cases that the output from Velox pipeline can exceed the configured batch size. We are not sure if it's an expected case in Velox. Could you help to raise an issue in Velox to illustrate the case you've, that the Hashagg output can exceed the configured batch size? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
