alamb commented on code in PR #13802:
URL: https://github.com/apache/datafusion/pull/13802#discussion_r1888360133
##########
datafusion/functions/src/strings.rs:
##########
@@ -124,8 +125,18 @@ pub struct StringArrayBuilder {
impl StringArrayBuilder {
pub fn with_capacity(item_capacity: usize, data_capacity: usize) -> Self {
- let mut offsets_buffer =
- MutableBuffer::with_capacity((item_capacity + 1) *
size_of::<i32>());
+ // MutableBuffer::with_capacity will round up to multiples of
`ALIGNMENT`
+ let capacity = std::cmp::max(
Review Comment:
The idea of the `with_capacity` is to allocate enough space for
`item_capacity + 1` offsets, each of which is an `i32`
The code below then assumes that it has successfully allocated space for
that many offsets and uses the `unsafe` API assuming that the allocation is
sized for `item_capacity + 1` items
In the case where the `capacity` calculation overflows, that assumption is
violated.
Thus, I think the right fix is for the code in DataFusion to check that the
value it passes to `mutable` is in fact sufficient to hold `item_capacity + 1`
`i32` values
I think you can do this by using
https://doc.rust-lang.org/std/primitive.i32.html#method.checked_mul instead of
`*` and panic'ing if the calculation is bad
##########
datafusion/functions/src/strings.rs:
##########
@@ -124,8 +125,18 @@ pub struct StringArrayBuilder {
impl StringArrayBuilder {
pub fn with_capacity(item_capacity: usize, data_capacity: usize) -> Self {
- let mut offsets_buffer =
- MutableBuffer::with_capacity((item_capacity + 1) *
size_of::<i32>());
+ // MutableBuffer::with_capacity will round up to multiples of
`ALIGNMENT`
+ let capacity = std::cmp::max(
+ item_capacity.saturating_add(1).saturating_mul(ALIGNMENT),
+ data_capacity,
+ );
+ // rust core's Layout::from_size_align is capped at isize::Max
+ let capacity = std::cmp::min(
Review Comment:
I am not sure what case you are trying to prevent by checking `ALIGNMENT`,
but it looks like maybe you have found a problem passing large (but not
overflowing) values to `MutableBuffer::with_capacity`
If this is the case, I have two suggestions:
1. Such a check belongs in arrow-rs (`MutableBuffer::with_capacity`)
2. File a ticket explaining what the issue is in arrow-rs along with a
reproducer before adding additional checks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]