alamb commented on code in PR #13966: URL: https://github.com/apache/datafusion/pull/13966#discussion_r1900398739
########## datafusion/functions-nested/src/set_ops.rs: ########## @@ -516,11 +516,18 @@ fn general_array_distinct<OffsetSize: OffsetSizeTrait>( let mut new_arrays = Vec::with_capacity(array.len()); let converter = RowConverter::new(vec![SortField::new(dt)])?; // distinct for each list in ListArray - for arr in array.iter().flatten() { + for arr in array.iter() { + let last_offset: OffsetSize = offsets.last().copied().unwrap(); + if arr.is_none() { + // Add same offset for null + offsets.push(last_offset); + continue; + } + + let arr = arr.unwrap(); Review Comment: I think another way to express this pattern without having to do `unwrap` is: ```suggestion let Some(arr) = arr else { // Add same offset for null offsets.push(last_offset); continue; } ``` ########## datafusion/sqllogictest/test_files/array.slt: ########## @@ -5674,6 +5674,13 @@ select array_distinct([sum(a)]) from t1 where a > 100 group by b; statement ok drop table t1; +query ? +select array_distinct(a) from values ([1, 2, 3]), (null), ([1, 3, 1]) as X(a); +---- +[1, 2, 3] +NULL +[1, 3] Review Comment: > Does this mean that `datafusion-cli -c select array_distinct(null);` should also succeed? It seems that `array_distinct` only accepts arguments of array type. > I would expect that `array_distinct(null)` would return `null` as well. A few lines up it seems there is a reference to - https://github.com/apache/datafusion/issues/7142 ``` #TODO: https://github.com/apache/datafusion/issues/7142 #query ? #select array_distinct(null); #---- #NULL ``` I tried it with this PR and found the query still doesn't work Thus I think this PR neither makes the behavior better or worse ########## datafusion/functions-nested/src/set_ops.rs: ########## @@ -538,6 +545,7 @@ fn general_array_distinct<OffsetSize: OffsetSizeTrait>( Arc::clone(field), offsets, values, - None, + // Keep the list nulls Review Comment: 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org