kosiew commented on code in PR #22296:
URL: https://github.com/apache/datafusion/pull/22296#discussion_r3292633354


##########
datafusion/functions-nested/src/resize.rs:
##########
@@ -206,18 +206,20 @@ fn general_list_resize<O: OffsetSizeTrait + TryInto<i64>>(
         if array.is_null(row_index) {
             continue;
         }
-        let target_count = 
count_array.value(row_index).to_usize().ok_or_else(|| {
-            internal_datafusion_err!("array_resize: failed to convert size to 
usize")
-        })?;
+        let target_count = target_count::<O>(count_array, row_index)?;
         output_values_len =
             output_values_len.checked_add(target_count).ok_or_else(|| {
-                internal_datafusion_err!("array_resize: output size overflow")
+                datafusion_common::DataFusionError::Execution(
+                    "array_resize: target size too large".to_string(),
+                )
             })?;
         let current_len = (offset_window[1] - 
offset_window[0]).to_usize().unwrap();
         if target_count > current_len {
             max_extra = max_extra.max(target_count - current_len);
         }
     }
+    validate_value_capacity(&data_type, output_values_len)?;

Review Comment:
   The pre-allocation validation still does not fully enforce the output-list 
offset invariant for `List` arrays.
   
   `target_count::<i32>` validates each row independently, but 
`output_values_len` can still exceed `i32::MAX` when multiple non-null rows are 
combined. For example, two `List` rows each resized to `1_073_741_824` elements 
produce a cumulative value length of `2_147_483_648`, which cannot be 
represented by `i32` offsets.
   
   This path can still pass `validate_value_capacity` for many value types and 
reach `MutableArrayData::with_capacities` or offset addition before rejection, 
so the current change only fixes the single-row overflow case.
   
   Could we validate the cumulative `output_values_len` against `O` before any 
fill array or mutable buffer allocation? Something like 
`O::from_usize(output_values_len).is_some()` would work.
   
   It would also be great to add a regression test covering the multi-row 
`List` overflow scenario.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to