alamb commented on code in PR #7057:
URL: https://github.com/apache/arrow-datafusion/pull/7057#discussion_r1273643821
##########
docs/source/user-guide/expressions.md:
##########
@@ -179,26 +179,28 @@ Unlike to some databases the math functions in Datafusion
works the same way as
## Array Expressions
-| Function | Notes
|
-| ------------------------------------ |
--------------------------------------------------------------------------------------------------------------
|
-| array_append(array, element) | Appends an element to the end of an
array. `array_append([1, 2, 3], 4) -> [1, 2, 3, 4]` |
-| array_concat(array[, ..., array_n]) | Concatenates arrays.
`array_concat([1, 2, 3], [4, 5, 6]) -> [1, 2, 3, 4, 5, 6]`
|
-| array_has(array, element) | Returns true if the array contains
the element `array_has([1,2,3], 1) -> true` |
-| array_has_all(array, sub-array) | Returns true if all elements of
sub-array exist in array `array_has_all([1,2,3], [1,3]) -> true` |
-| array_has_any(array, sub-array) | Returns true if any elements exist in
both arrays `array_has_any([1,2,3], [1,4]) -> true` |
-| array_dims(array) | Returns an array of the array's
dimensions. `array_dims([[1, 2, 3], [4, 5, 6]]) -> [2, 3]` |
-| array_fill(element, array) | Returns an array filled with copies
of the given value. |
-| array_length(array, dimension) | Returns the length of the array
dimension. `array_length([1, 2, 3, 4, 5]) -> 5` |
-| array_ndims(array) | Returns the number of dimensions of
the array. `array_ndims([[1, 2, 3], [4, 5, 6]]) -> 2` |
-| array_position(array, element) | Searches for an element in the array,
returns first occurrence. `array_position([1, 2, 2, 3, 4], 2) -> 2` |
-| array_positions(array, element) | Searches for an element in the array,
returns all occurrences. `array_positions([1, 2, 2, 3, 4], 2) -> [2, 3]` |
-| array_prepend(array, element) | Prepends an element to the beginning
of an array. `array_prepend(1, [2, 3, 4]) -> [1, 2, 3, 4]` |
-| array_remove(array, element) | Removes all elements equal to the
given value from the array. |
-| array_replace(array, from, to) | Replaces a specified element with
another specified element. |
-| array_to_string(array, delimeter) | Converts each element to its text
representation. `array_to_string([1, 2, 3, 4], ',') -> 1,2,3,4` |
-| cardinality(array) | Returns the total number of elements
in the array. `cardinality([[1, 2, 3], [4, 5, 6]]) -> 6` |
-| make_array(value1, [value2 [, ...]]) | Returns an Arrow array using the
specified input expressions. `make_array(1, 2, 3) -> [1, 2, 3]` |
-| trim_array(array, n) | Removes the last n elements from the
array. |
+| Function | Notes
|
+| ------------------------------------ |
--------------------------------------------------------------------------------------------------------------------------------------------------
|
+| array_append(array, element) | Appends an element to the end of an
array. `array_append([1, 2, 3], 4) -> [1, 2, 3, 4]`
|
+| array_concat(array[, ..., array_n]) | Concatenates arrays.
`array_concat([1, 2, 3], [4, 5, 6]) -> [1, 2, 3, 4, 5, 6]`
|
+| array_has(array, element) | Returns true if the array contains
the element `array_has([1,2,3], 1) -> true`
|
+| array_has_all(array, sub-array) | Returns true if all elements of
sub-array exist in array `array_has_all([1,2,3], [1,3]) -> true`
|
+| array_has_any(array, sub-array) | Returns true if any elements exist in
both arrays `array_has_any([1,2,3], [1,4]) -> true`
|
+| array_dims(array) | Returns an array of the array's
dimensions. `array_dims([[1, 2, 3], [4, 5, 6]]) -> [2, 3]`
|
+| array_fill(element, array) | Returns an array filled with copies
of the given value.
|
+| array_length(array, dimension) | Returns the length of the array
dimension. `array_length([1, 2, 3, 4, 5]) -> 5`
|
+| array_ndims(array) | Returns the number of dimensions of
the array. `array_ndims([[1, 2, 3], [4, 5, 6]]) -> 2`
|
+| array_position(array, element) | Searches for an element in the array,
returns first occurrence. `array_position([1, 2, 2, 3, 4], 2) -> 2`
|
+| array_positions(array, element) | Searches for an element in the array,
returns all occurrences. `array_positions([1, 2, 2, 3, 4], 2) -> [2, 3]`
|
+| array_prepend(array, element) | Prepends an element to the beginning
of an array. `array_prepend(1, [2, 3, 4]) -> [1, 2, 3, 4]`
|
+| array_remove(array, element) | Removes one element from the array
equal to the given value. `array_remove([1, 2, 2, 3, 1, 4], 2) -> [1, 2, 3, 1,
4]` |
+| array_removes(array, element) | Removes all elements from the array
equal to the given value. `array_removes([1, 2, 2, 3, 1, 4], 2) -> [1, 3, 1,
4]` |
Review Comment:
Given there is a function named `array_has_all` that matches only if all
arguments are present, it seems to me that more consistent names would be:
* `array_removes` --> `array_remove_all`
* `array_replaces` --> `array_replace_all`
##########
docs/source/user-guide/expressions.md:
##########
@@ -179,26 +179,28 @@ Unlike to some databases the math functions in Datafusion
works the same way as
## Array Expressions
-| Function | Notes
|
-| ------------------------------------ |
--------------------------------------------------------------------------------------------------------------
|
-| array_append(array, element) | Appends an element to the end of an
array. `array_append([1, 2, 3], 4) -> [1, 2, 3, 4]` |
-| array_concat(array[, ..., array_n]) | Concatenates arrays.
`array_concat([1, 2, 3], [4, 5, 6]) -> [1, 2, 3, 4, 5, 6]`
|
-| array_has(array, element) | Returns true if the array contains
the element `array_has([1,2,3], 1) -> true` |
-| array_has_all(array, sub-array) | Returns true if all elements of
sub-array exist in array `array_has_all([1,2,3], [1,3]) -> true` |
-| array_has_any(array, sub-array) | Returns true if any elements exist in
both arrays `array_has_any([1,2,3], [1,4]) -> true` |
-| array_dims(array) | Returns an array of the array's
dimensions. `array_dims([[1, 2, 3], [4, 5, 6]]) -> [2, 3]` |
-| array_fill(element, array) | Returns an array filled with copies
of the given value. |
-| array_length(array, dimension) | Returns the length of the array
dimension. `array_length([1, 2, 3, 4, 5]) -> 5` |
-| array_ndims(array) | Returns the number of dimensions of
the array. `array_ndims([[1, 2, 3], [4, 5, 6]]) -> 2` |
-| array_position(array, element) | Searches for an element in the array,
returns first occurrence. `array_position([1, 2, 2, 3, 4], 2) -> 2` |
-| array_positions(array, element) | Searches for an element in the array,
returns all occurrences. `array_positions([1, 2, 2, 3, 4], 2) -> [2, 3]` |
-| array_prepend(array, element) | Prepends an element to the beginning
of an array. `array_prepend(1, [2, 3, 4]) -> [1, 2, 3, 4]` |
-| array_remove(array, element) | Removes all elements equal to the
given value from the array. |
-| array_replace(array, from, to) | Replaces a specified element with
another specified element. |
-| array_to_string(array, delimeter) | Converts each element to its text
representation. `array_to_string([1, 2, 3, 4], ',') -> 1,2,3,4` |
-| cardinality(array) | Returns the total number of elements
in the array. `cardinality([[1, 2, 3], [4, 5, 6]]) -> 6` |
-| make_array(value1, [value2 [, ...]]) | Returns an Arrow array using the
specified input expressions. `make_array(1, 2, 3) -> [1, 2, 3]` |
-| trim_array(array, n) | Removes the last n elements from the
array. |
+| Function | Notes
|
+| ------------------------------------ |
--------------------------------------------------------------------------------------------------------------------------------------------------
|
+| array_append(array, element) | Appends an element to the end of an
array. `array_append([1, 2, 3], 4) -> [1, 2, 3, 4]`
|
+| array_concat(array[, ..., array_n]) | Concatenates arrays.
`array_concat([1, 2, 3], [4, 5, 6]) -> [1, 2, 3, 4, 5, 6]`
|
+| array_has(array, element) | Returns true if the array contains
the element `array_has([1,2,3], 1) -> true`
|
+| array_has_all(array, sub-array) | Returns true if all elements of
sub-array exist in array `array_has_all([1,2,3], [1,3]) -> true`
|
+| array_has_any(array, sub-array) | Returns true if any elements exist in
both arrays `array_has_any([1,2,3], [1,4]) -> true`
|
+| array_dims(array) | Returns an array of the array's
dimensions. `array_dims([[1, 2, 3], [4, 5, 6]]) -> [2, 3]`
|
+| array_fill(element, array) | Returns an array filled with copies
of the given value.
|
+| array_length(array, dimension) | Returns the length of the array
dimension. `array_length([1, 2, 3, 4, 5]) -> 5`
|
+| array_ndims(array) | Returns the number of dimensions of
the array. `array_ndims([[1, 2, 3], [4, 5, 6]]) -> 2`
|
+| array_position(array, element) | Searches for an element in the array,
returns first occurrence. `array_position([1, 2, 2, 3, 4], 2) -> 2`
|
+| array_positions(array, element) | Searches for an element in the array,
returns all occurrences. `array_positions([1, 2, 2, 3, 4], 2) -> [2, 3]`
|
+| array_prepend(array, element) | Prepends an element to the beginning
of an array. `array_prepend(1, [2, 3, 4]) -> [1, 2, 3, 4]`
|
+| array_remove(array, element) | Removes one element from the array
equal to the given value. `array_remove([1, 2, 2, 3, 1, 4], 2) -> [1, 2, 3, 1,
4]` |
+| array_removes(array, element) | Removes all elements from the array
equal to the given value. `array_removes([1, 2, 2, 3, 1, 4], 2) -> [1, 3, 1,
4]` |
+| array_replace(array, from, to) | Replaces one occurrence of the
specified element with another specified element. `array_replace([1, 2, 2, 3,
1, 4], 2, 5) -> [1, 5, 2, 3, 1, 4]` |
Review Comment:
```suggestion
| array_replace(array, from, to) | Replaces the first occurrence of
the specified element with another specified element. `array_replace([1, 2, 2,
3, 1, 4], 2, 5) -> [1, 5, 2, 3, 1, 4]` |
```
##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -1957,20 +2175,124 @@ mod tests {
#[test]
fn test_array_remove() {
- // array_remove([1, 2, 3, 4], 3) = [1, 2, 4]
- let list_array = return_array();
- let arr = array_remove(&[
+ // array_remove([3, 1, 2, 3, 2, 3], 3) = [1, 2, 3, 2, 3]
+ let list_array = return_array_with_repeating_elements().into_array(1);
+ let array = array_remove(&[list_array,
Arc::new(Int64Array::from_value(3, 1))])
+ .expect("failed to initialize function array_remove");
+ let result =
+ as_list_array(&array).expect("failed to initialize function
array_remove");
+
+ assert_eq!(result.len(), 1);
+ assert_eq!(
+ &[1, 2, 3, 2, 3],
+ result
+ .value(0)
+ .as_any()
+ .downcast_ref::<Int64Array>()
+ .unwrap()
+ .values()
+ );
+
+ // array_remove([3, 1, 2, 3, 2, 3], 3, 2) = [1, 2, 2, 3]
+ let list_array = return_array_with_repeating_elements().into_array(1);
+ let array = array_remove(&[
list_array,
- ColumnarValue::Scalar(ScalarValue::Int64(Some(3))),
+ Arc::new(Int64Array::from_value(3, 1)),
+ Arc::new(Int64Array::from_value(2, 1)),
+ ])
+ .expect("failed to initialize function array_remove");
+ let result =
+ as_list_array(&array).expect("failed to initialize function
array_remove");
+
+ assert_eq!(result.len(), 1);
+ assert_eq!(
+ &[1, 2, 2, 3],
+ result
+ .value(0)
+ .as_any()
+ .downcast_ref::<Int64Array>()
+ .unwrap()
+ .values()
+ );
+ }
+
+ #[test]
+ fn test_nested_array_remove() {
+ // array_remove(
+ // [[1, 2, 3, 4], [5, 6, 7, 8], [1, 2, 3, 4], [9, 10, 11, 12], [5,
6, 7, 8]],
+ // [1, 2, 3, 4],
+ // ) = [[5, 6, 7, 8], [1, 2, 3, 4], [9, 10, 11, 12], [5, 6, 7, 8]]
+ let list_array =
return_nested_array_with_repeating_elements().into_array(1);
+ let element_array = return_array().into_array(1);
+ let array = array_remove(&[list_array, element_array])
+ .expect("failed to initialize function array_remove");
+ let result =
+ as_list_array(&array).expect("failed to initialize function
array_remove");
+
+ assert_eq!(result.len(), 1);
+ let data = vec![
Review Comment:
I think we should really be testing with NULLs as well
##########
docs/source/user-guide/expressions.md:
##########
@@ -179,26 +179,28 @@ Unlike to some databases the math functions in Datafusion
works the same way as
## Array Expressions
-| Function | Notes
|
-| ------------------------------------ |
--------------------------------------------------------------------------------------------------------------
|
-| array_append(array, element) | Appends an element to the end of an
array. `array_append([1, 2, 3], 4) -> [1, 2, 3, 4]` |
-| array_concat(array[, ..., array_n]) | Concatenates arrays.
`array_concat([1, 2, 3], [4, 5, 6]) -> [1, 2, 3, 4, 5, 6]`
|
-| array_has(array, element) | Returns true if the array contains
the element `array_has([1,2,3], 1) -> true` |
-| array_has_all(array, sub-array) | Returns true if all elements of
sub-array exist in array `array_has_all([1,2,3], [1,3]) -> true` |
-| array_has_any(array, sub-array) | Returns true if any elements exist in
both arrays `array_has_any([1,2,3], [1,4]) -> true` |
-| array_dims(array) | Returns an array of the array's
dimensions. `array_dims([[1, 2, 3], [4, 5, 6]]) -> [2, 3]` |
-| array_fill(element, array) | Returns an array filled with copies
of the given value. |
-| array_length(array, dimension) | Returns the length of the array
dimension. `array_length([1, 2, 3, 4, 5]) -> 5` |
-| array_ndims(array) | Returns the number of dimensions of
the array. `array_ndims([[1, 2, 3], [4, 5, 6]]) -> 2` |
-| array_position(array, element) | Searches for an element in the array,
returns first occurrence. `array_position([1, 2, 2, 3, 4], 2) -> 2` |
-| array_positions(array, element) | Searches for an element in the array,
returns all occurrences. `array_positions([1, 2, 2, 3, 4], 2) -> [2, 3]` |
-| array_prepend(array, element) | Prepends an element to the beginning
of an array. `array_prepend(1, [2, 3, 4]) -> [1, 2, 3, 4]` |
-| array_remove(array, element) | Removes all elements equal to the
given value from the array. |
-| array_replace(array, from, to) | Replaces a specified element with
another specified element. |
-| array_to_string(array, delimeter) | Converts each element to its text
representation. `array_to_string([1, 2, 3, 4], ',') -> 1,2,3,4` |
-| cardinality(array) | Returns the total number of elements
in the array. `cardinality([[1, 2, 3], [4, 5, 6]]) -> 6` |
-| make_array(value1, [value2 [, ...]]) | Returns an Arrow array using the
specified input expressions. `make_array(1, 2, 3) -> [1, 2, 3]` |
-| trim_array(array, n) | Removes the last n elements from the
array. |
+| Function | Notes
|
+| ------------------------------------ |
--------------------------------------------------------------------------------------------------------------------------------------------------
|
+| array_append(array, element) | Appends an element to the end of an
array. `array_append([1, 2, 3], 4) -> [1, 2, 3, 4]`
|
+| array_concat(array[, ..., array_n]) | Concatenates arrays.
`array_concat([1, 2, 3], [4, 5, 6]) -> [1, 2, 3, 4, 5, 6]`
|
+| array_has(array, element) | Returns true if the array contains
the element `array_has([1,2,3], 1) -> true`
|
+| array_has_all(array, sub-array) | Returns true if all elements of
sub-array exist in array `array_has_all([1,2,3], [1,3]) -> true`
|
+| array_has_any(array, sub-array) | Returns true if any elements exist in
both arrays `array_has_any([1,2,3], [1,4]) -> true`
|
+| array_dims(array) | Returns an array of the array's
dimensions. `array_dims([[1, 2, 3], [4, 5, 6]]) -> [2, 3]`
|
+| array_fill(element, array) | Returns an array filled with copies
of the given value.
|
+| array_length(array, dimension) | Returns the length of the array
dimension. `array_length([1, 2, 3, 4, 5]) -> 5`
|
+| array_ndims(array) | Returns the number of dimensions of
the array. `array_ndims([[1, 2, 3], [4, 5, 6]]) -> 2`
|
+| array_position(array, element) | Searches for an element in the array,
returns first occurrence. `array_position([1, 2, 2, 3, 4], 2) -> 2`
|
+| array_positions(array, element) | Searches for an element in the array,
returns all occurrences. `array_positions([1, 2, 2, 3, 4], 2) -> [2, 3]`
|
+| array_prepend(array, element) | Prepends an element to the beginning
of an array. `array_prepend(1, [2, 3, 4]) -> [1, 2, 3, 4]`
|
+| array_remove(array, element) | Removes one element from the array
equal to the given value. `array_remove([1, 2, 2, 3, 1, 4], 2) -> [1, 2, 3, 1,
4]` |
Review Comment:
Maybe we could explicitly say "Removes the first element from the array
equal to the given value":
```suggestion
| array_remove(array, element) | Removes the first element from the
array equal to the given value. `array_remove([1, 2, 2, 3, 1, 4], 2) -> [1, 2,
3, 1, 4]` |
```
##########
docs/source/user-guide/sql/scalar_functions.md:
##########
@@ -1777,24 +1779,72 @@ _Alias of [array_prepend](#array_prepend)._
### `array_remove`
-Removes all elements equal to the given value from the array.
+Removes one element from the array equal to the given value.
Review Comment:
```suggestion
Removes the first element from the array equal to the given value.
```
##########
datafusion/core/tests/sqllogictests/test_files/array.slt:
##########
@@ -907,14 +907,122 @@ select array_positions(column1, make_array(4, 5, 6)),
array_positions(make_array
[6] []
[1] []
-## array_replace
+## array_replace (replaces one occurrence of an element) (aliases:
`list_replace`)
-# array_replace scalar function
+# array_replace scalar function #1
query ???
select array_replace(make_array(1, 2, 3, 4), 2, 3),
array_replace(make_array(1, 4, 4, 5, 4, 6, 7), 4, 0),
array_replace(make_array(1, 2, 3), 4, 0);
----
+[1, 3, 3, 4] [1, 0, 4, 5, 4, 6, 7] [1, 2, 3]
+
+# array_replace scalar function #2 (with optional argument)
+query ???
+select array_replace(make_array(1, 2, 3, 4), 2, 3, 2),
array_replace(make_array(1, 4, 4, 5, 4, 6, 7), 4, 0, 2),
array_replace(make_array(1, 2, 3), 4, 0, 3);
Review Comment:
> What do you think of these features, @alamb and @jayzhan211? Should they
be implemented or not?
I found the fact that `array_replace` has 4 arguments very confusing
initially. All "replace" APIs I can think of (e.g. Rust's
[`replace`](https://doc.rust-lang.org/std/primitive.str.html#method.replace))
take only three arguments: the value, the pattern to search for and the
replacement value.
If we want to implement "replace the first N" values, I think we should add
a new function like `array_replace_n` (modeled after
[`replacen`](https://doc.rust-lang.org/std/primitive.str.html#method.replacen))
It would be great to share the implementation as you have done in this PR
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]