yangzhg opened a new issue, #5307:
URL: https://github.com/apache/incubator-gluten/issues/5307

   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   In the `VeloxShuffleWriter` component of the project, several variables are 
declared with the `uint16_t` data type, which includes `partition2RowCount_`, 
`partition2RowOffsetBase_`, `rowOffset2RowId_`, `partitionBufferSize_`, and 
`partitionBufferBase_`. These variables serve various purposes, such as 
tracking the row count for each partition, managing row offsets, and holding 
partition buffer sizes and write positions.
   
   However, there's a potential overflow risk associated with these `uint16_t` 
variables when interfacing with Velox's RowVector data structure. The size 
attribute of a RowVector in Velox is of type `vector_size_t` (`int32_t`), which 
allows for a significantly larger range of values compared to `uint16_t`. This 
discrepancy in data type sizes could lead to overflow issues when `RowVector` 
sizes exceed the maximum value that a `uint16_t` variable can represent 
(65,535).
   
   Moreover, an additional issue has been identified with Velox's 
`HashAggregation::getOutput` method, which does not strictly control the number 
of rows outputted. This lack of stringent control further exacerbates the 
potential for overflow within `VeloxShuffleWriter`, considering the existing 
data type mismatches.
   
   
https://github.com/facebookincubator/velox/blob/84ae6bfd6f61775b9e20742cdc5bc73ecd0829c3/velox/exec/HashAggregation.cpp#L266
   
   ### Spark version
   
   None
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to