asubiotto opened a new issue, #7130:
URL: https://github.com/apache/arrow-rs/issues/7130
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
Using datafusion to run a window function partitioned by a nested data type
column results in a nested comparison error during execution:
```
InvalidArgumentError("Nested comparison: Struct([Field { name: \"f1\",
data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata:
{} }]) IS DISTINCT FROM Struct([Field { name: \"f1\", data_type: Int64,
nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]) (hint: use
make_comparator instead)")
```
This is a feature request to add nested partitioning support to the
partition kernel:
https://github.com/apache/arrow-rs/blob/d4b9482f5dee47a0f2f7afc129db83b8ac8df406/arrow-ord/src/partition.rs#L126
**Describe the solution you'd like**
`partition` shells out to `distinct`, which does not support nested
comparisons:
https://github.com/apache/arrow-rs/blob/d4b9482f5dee47a0f2f7afc129db83b8ac8df406/arrow-ord/src/cmp.rs#L179-L181
My proposal is to add a check for nested type columns and use
`make_comparator` to check for value distinctness instead.
**Describe alternatives you've considered**
- Expanding nested array fields to primitive arrays. This seems costly
- Allowing nested comparisons in `compare_op` for certain op types where
null ordering semantics don't matter (which is the case here I think). This is
another option, but it seems like the proposed approach is a more general
solution which can be swapped out if performance becomes an issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]