lyne7-sc commented on code in PR #22390:
URL: https://github.com/apache/datafusion/pull/22390#discussion_r3282407850


##########
datafusion/functions-nested/src/remove.rs:
##########
@@ -468,6 +531,98 @@ fn general_remove<OffsetSize: OffsetSizeTrait>(
     )?))
 }
 
+/// For each element of `list_array[i]`, removed up to `arr_n[i]` occurrences
+/// of `needle[0]` (scalar element broadcasted).
+///
+/// This is a specialized version of `general_remove` for scalar elements that
+/// uses bulk comparison for better performance.
+fn general_remove_with_scalar<OffsetSize: OffsetSizeTrait>(
+    list_array: &GenericListArray<OffsetSize>,
+    needle: &ArrayRef,
+    arr_n: &[i64],
+) -> Result<ArrayRef> {
+    let list_field = match list_array.data_type() {
+        DataType::List(field) | DataType::LargeList(field) => field,
+        _ => {
+            return exec_err!(
+                "Expected List or LargeList data type, got {:?}",
+                list_array.data_type()
+            );
+        }
+    };
+    let original_data = list_array.values().to_data();

Review Comment:
   I now slice the values to the range actually referenced by the offsets.
   
   That said, I wanted to understand your concern better: when a 
`GenericListArray` is sliced, `values()` returns the full underlying array, and 
`to_data()` on it wraps the existing buffer references into `ArrayData` without 
copying. So the main downside I could identify is that 
`Capacities::Array(original_data.len())` over-estimates the pre-allocation for 
sliced inputs. Were you thinking of a different inefficiency, or is the 
over-allocation what you had in mind?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to