alamb commented on issue #14247: URL: https://github.com/apache/datafusion/issues/14247#issuecomment-2613389124
> I am fine with DF not shipping extension types (ie no extension types until we add them explicitly in [#12644](https://github.com/apache/datafusion/issues/12644)). Let's look at the example. I have `Field` information for `a` and `b` columns, both are `DataType::Binary`. What should be the behavior of `a = b` according to DF core's logic? Naive answer would be that given same DataType, the expression is valid and Arrow's comparison function should be used. I agree that the code as written today would compare the columns as binary rather than the user defined type. > However, if one (or both) of then happen to be custom-provided extension type (e.g. JSON, VARIANT, ST_Geometry), this logic should not be used at all, not even as a fallback. I also agree with this. > So the bare minimum is -- given a Field, DF core needs to understand whether this is a type it knows about or a type it doesn't know about.... So we're almost back to explicit extension types. Here are some possible ways we could support `=` on a user defined type ## Option 1: User defined operators In this case, we would let users override `=` similar to a UDF, implementing support for extension types. The users implementation could then fall back into the built in `=` implementation when it didn't have special rules I think this might get tricky when multiple extension types were used (it might be hard to hook json and geometry without a bunch of glue code) ## Option 2: Custom analyzer rules I this case the extension could add a custom ANalyzer rule that walked over all plan `Expr`s, finding any that were relevant to the user defined type and rewriting the expressions to use a function (eg. rewrite `geo_col1 = geo_col2` into `udf_compare_geos(geo_col1, geo_col2)` This might not be ideal as there would likely be a lot of replicated code in extensions (like matching on equality) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
