alamb commented on code in PR #13966:
URL: https://github.com/apache/datafusion/pull/13966#discussion_r1900398739
##########
datafusion/functions-nested/src/set_ops.rs:
##########
@@ -516,11 +516,18 @@ fn general_array_distinct<OffsetSize: OffsetSizeTrait>(
let mut new_arrays = Vec::with_capacity(array.len());
let converter = RowConverter::new(vec![SortField::new(dt)])?;
// distinct for each list in ListArray
- for arr in array.iter().flatten() {
+ for arr in array.iter() {
+ let last_offset: OffsetSize = offsets.last().copied().unwrap();
+ if arr.is_none() {
+ // Add same offset for null
+ offsets.push(last_offset);
+ continue;
+ }
+
+ let arr = arr.unwrap();
Review Comment:
I think another way to express this pattern without having to do `unwrap` is:
```suggestion
let Some(arr) = arr else {
// Add same offset for null
offsets.push(last_offset);
continue;
}
```
##########
datafusion/sqllogictest/test_files/array.slt:
##########
@@ -5674,6 +5674,13 @@ select array_distinct([sum(a)]) from t1 where a > 100
group by b;
statement ok
drop table t1;
+query ?
+select array_distinct(a) from values ([1, 2, 3]), (null), ([1, 3, 1]) as X(a);
+----
+[1, 2, 3]
+NULL
+[1, 3]
Review Comment:
> Does this mean that `datafusion-cli -c select array_distinct(null);`
should also succeed? It seems that `array_distinct` only accepts arguments of
array type.
>
I would expect that `array_distinct(null)` would return `null` as well. A
few lines up it seems there is a reference to
- https://github.com/apache/datafusion/issues/7142
```
#TODO: https://github.com/apache/datafusion/issues/7142
#query ?
#select array_distinct(null);
#----
#NULL
```
I tried it with this PR and found the query still doesn't work
Thus I think this PR neither makes the behavior better or worse
##########
datafusion/functions-nested/src/set_ops.rs:
##########
@@ -538,6 +545,7 @@ fn general_array_distinct<OffsetSize: OffsetSizeTrait>(
Arc::clone(field),
offsets,
values,
- None,
+ // Keep the list nulls
Review Comment:
👍
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]