kosiew commented on code in PR #22295:
URL: https://github.com/apache/datafusion/pull/22295#discussion_r3292651626


##########
datafusion/functions-nested/src/repeat.rs:
##########
@@ -238,18 +238,48 @@ fn general_list_repeat<O: OffsetSizeTrait>(
     for i in 0..count_array.len() {
         let count = get_count_with_validity(count_array, i);
         if count > 0 {
-            outer_total += count;
+            outer_total = outer_total.checked_add(count).ok_or_else(|| {
+                DataFusionError::Execution(
+                    "array_repeat: repeated list count exceeds 
capacity".to_string(),
+                )
+            })?;
             if list_array.is_valid(i) {
                 let len = list_offsets[i + 1].to_usize().unwrap()
                     - list_offsets[i].to_usize().unwrap();
-                inner_total += len * count;
+                let repeated_len = len.checked_mul(count).ok_or_else(|| {
+                    DataFusionError::Execution(
+                        "array_repeat: repeated inner array length exceeds 
capacity"
+                            .to_string(),
+                    )
+                })?;
+                inner_total = 
inner_total.checked_add(repeated_len).ok_or_else(|| {
+                    DataFusionError::Execution(
+                        "array_repeat: repeated inner array length exceeds 
capacity"
+                            .to_string(),
+                    )
+                })?;
             }
         }
     }
+    if O::from_usize(outer_total).is_none() {

Review Comment:
   Nice addition. I think it could also be helpful to add a focused regression 
test for the new `O::from_usize(...)` offset-bound path.
   
   Right now the added SQL/unit tests cover the `len * count` `usize` overflow 
case, but they do not exercise the pre-allocation guard that prevents 
attempting very large `List` offset/bitmap allocations when only the Arrow 
offset type limit is exceeded.
   
   A couple cases that might be useful:
   - a `List` input whose repeated inner length exceeds `i32::MAX` without 
overflowing `usize`
   - an empty list with count `i32::MAX + 1`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to