asubiotto opened a new issue, #7130:
URL: https://github.com/apache/arrow-rs/issues/7130

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   Using datafusion to run a window function partitioned by a nested data type 
column results in a nested comparison error during execution:
   ```
   InvalidArgumentError("Nested comparison: Struct([Field { name: \"f1\", 
data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: 
{} }]) IS DISTINCT FROM Struct([Field { name: \"f1\", data_type: Int64, 
nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]) (hint: use 
make_comparator instead)")
   ```
   This is a feature request to add nested partitioning support to the 
partition kernel:
   
https://github.com/apache/arrow-rs/blob/d4b9482f5dee47a0f2f7afc129db83b8ac8df406/arrow-ord/src/partition.rs#L126
   
   **Describe the solution you'd like**
   `partition` shells out to `distinct`, which does not support nested 
comparisons:
   
https://github.com/apache/arrow-rs/blob/d4b9482f5dee47a0f2f7afc129db83b8ac8df406/arrow-ord/src/cmp.rs#L179-L181
   My proposal is to add a check for nested type columns and use 
`make_comparator` to check for value distinctness instead.
   
   **Describe alternatives you've considered**
   - Expanding nested array fields to primitive arrays. This seems costly
   - Allowing nested comparisons in `compare_op` for certain op types where 
null ordering semantics don't matter (which is the case here I think). This is 
another option, but it seems like the proposed approach is a more general 
solution which can be swapped out if performance becomes an issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to