jacksonrnewhouse opened a new issue, #9254:
URL: https://github.com/apache/arrow-datafusion/issues/9254

   ### Describe the bug
   
   If you attempt to join two tables on a struct field, the query will plan it 
successfully, albeit with the struct equality in a the `filter`, rather than in 
the `on` vector. However, when it runs it fails with "Invalid comparison 
operation". In particular, it triggers this error from arrow-rs: 
https://github.com/apache/arrow-rs/blob/db811083669df66992008c9409b743a2e365adb0/arrow-ord/src/cmp.rs#L202.
   
   ### To Reproduce
   
   I wrote a failing test that just does a self join at 
https://github.com/apache/arrow-datafusion/compare/35.0.0...ArroyoSystems:arrow-datafusion:bug_report/struct_join_fails_at_execution.
 The failure message is 
   ```
   thread 'user_defined::user_defined_aggregates::test_struct_join' panicked at 
datafusion/core/tests/user_defined/user_defined_aggregates.rs:172:60:
   called `Result::unwrap()` on an `Err` value: Execution("Fail to build join 
indices in NestedLoopJoinExec, error:Arrow error: Invalid argument error: 
Invalid comparison operation: Struct([Field { name: \"value\", data_type: 
Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 
Field { name: \"time\", data_type: Timestamp(Nanosecond, None), nullable: true, 
dict_id: 0, dict_is_ordered: false, metadata: {} }]) == Struct([Field { name: 
\"value\", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: 
false, metadata: {} }, Field { name: \"time\", data_type: Timestamp(Nanosecond, 
None), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }])")
   ```
   
   ### Expected behavior
   
   Either the join should fail at planning, reporting a clear error that joins 
on structs are not supported or, preferably, datafusion should support joins on 
two structs of the same type.
   
   ### Additional context
   
   This comes up with Arroyo where we want to join on time windows, e.g. 
sliding and tumbling windows.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to