yangzhg opened a new issue, #5307: URL: https://github.com/apache/incubator-gluten/issues/5307
### Backend VL (Velox) ### Bug description In the `VeloxShuffleWriter` component of the project, several variables are declared with the `uint16_t` data type, which includes `partition2RowCount_`, `partition2RowOffsetBase_`, `rowOffset2RowId_`, `partitionBufferSize_`, and `partitionBufferBase_`. These variables serve various purposes, such as tracking the row count for each partition, managing row offsets, and holding partition buffer sizes and write positions. However, there's a potential overflow risk associated with these `uint16_t` variables when interfacing with Velox's RowVector data structure. The size attribute of a RowVector in Velox is of type `vector_size_t` (`int32_t`), which allows for a significantly larger range of values compared to `uint16_t`. This discrepancy in data type sizes could lead to overflow issues when `RowVector` sizes exceed the maximum value that a `uint16_t` variable can represent (65,535). Moreover, an additional issue has been identified with Velox's `HashAggregation::getOutput` method, which does not strictly control the number of rows outputted. This lack of stringent control further exacerbates the potential for overflow within `VeloxShuffleWriter`, considering the existing data type mismatches. https://github.com/facebookincubator/velox/blob/84ae6bfd6f61775b9e20742cdc5bc73ecd0829c3/velox/exec/HashAggregation.cpp#L266 ### Spark version None ### Spark configurations _No response_ ### System information _No response_ ### Relevant logs _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
