Blizzara opened a new issue, #10856:
URL: https://github.com/apache/datafusion/issues/10856

   ### Is your feature request related to a problem or challenge?
   
   We're working on running some used-to-be-Spark pipelines through DataFusion. 
One case we've noticed where DataFusion doesn't support something is comparing 
lists. (Spark 
allows)[https://github.com/apache/spark/blame/d9394eee5ebbeb695baaec6122da2ed970842dfd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L1025]
 comparing (==, !=, <, >, <=, >=, ..) columns of structs and lists, while in 
DataFusion those seem to throw:
   
   For structs, from our internal testing:
   ```
   ArrowError(InvalidArgumentError("Invalid comparison operation: Struct([Field 
{ name: \"a\", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: 
false, metadata: {} }, Field { name: \"b\", data_type: Int32, nullable: false, 
dict_id: 0, dict_is_ordered: false, metadata: {} }]) <= Struct([Field { name: 
\"a\", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, 
metadata: {} }, Field { name: \"b\", data_type: Int32, nullable: false, 
dict_id: 0, dict_is_ordered: false, metadata: {} }])"), None)
   ```
   
   For lists, this is shown in DataFusion's tests: 
https://github.com/apache/datafusion/blob/3773fb7fb54419f889e7d18b73e9eb48069eb08e/datafusion/sqllogictest/test_files/array_query.slt#L44
   
   Maybe this would need to be improved on Arrow directly, seeing that the 
error is coming from 
https://github.com/apache/arrow-rs/blob/087f34b70e97ee85e1a54b3c45c5ed814f500b0a/arrow-ord/src/cmp.rs#L219?
   
   ### Describe the solution you'd like
   
   Binary predicates to be allowed for structs and lists, preferably following 
same semantics as in Spark (mostly I think it's a DFS over all the fields 
https://github.com/apache/spark/blob/d9394eee5ebbeb695baaec6122da2ed970842dfd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala#L285)
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   Related to https://github.com/apache/datafusion/issues/2326


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to