alamb commented on code in PR #13802:
URL: https://github.com/apache/datafusion/pull/13802#discussion_r1888360133


##########
datafusion/functions/src/strings.rs:
##########
@@ -124,8 +125,18 @@ pub struct StringArrayBuilder {
 
 impl StringArrayBuilder {
     pub fn with_capacity(item_capacity: usize, data_capacity: usize) -> Self {
-        let mut offsets_buffer =
-            MutableBuffer::with_capacity((item_capacity + 1) * 
size_of::<i32>());
+        // MutableBuffer::with_capacity will round up to multiples of 
`ALIGNMENT`
+        let capacity = std::cmp::max(

Review Comment:
   The idea of the `with_capacity` is to allocate enough space for 
`item_capacity + 1` offsets, each of which is an `i32`
   
   The code below then assumes that it has successfully allocated space for 
that many offsets and uses the `unsafe` API assuming that the allocation is 
sized for `item_capacity + 1` items
   
   In the case where the `capacity` calculation overflows, that assumption is 
violated. 
   
   Thus, I think the right fix is for the code in DataFusion to check that the 
value it passes to `mutable` is in fact sufficient to hold `item_capacity + 1` 
`i32` values
   
   I think you can do this by using 
https://doc.rust-lang.org/std/primitive.i32.html#method.checked_mul instead of 
`*` and panic'ing if the calculation is bad



##########
datafusion/functions/src/strings.rs:
##########
@@ -124,8 +125,18 @@ pub struct StringArrayBuilder {
 
 impl StringArrayBuilder {
     pub fn with_capacity(item_capacity: usize, data_capacity: usize) -> Self {
-        let mut offsets_buffer =
-            MutableBuffer::with_capacity((item_capacity + 1) * 
size_of::<i32>());
+        // MutableBuffer::with_capacity will round up to multiples of 
`ALIGNMENT`
+        let capacity = std::cmp::max(
+            item_capacity.saturating_add(1).saturating_mul(ALIGNMENT),
+            data_capacity,
+        );
+        // rust core's Layout::from_size_align is capped at isize::Max
+        let capacity = std::cmp::min(

Review Comment:
   I am not sure what case you are trying to prevent by checking `ALIGNMENT`, 
but it looks like maybe you have found a problem passing large (but not 
overflowing) values to `MutableBuffer::with_capacity`
   
   If this is the case, I have two suggestions:
   1. Such a check belongs in arrow-rs (`MutableBuffer::with_capacity`)
   2. File a ticket explaining what the issue is in arrow-rs along with a 
reproducer before adding additional checks. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to