alamb commented on issue #14247:
URL: https://github.com/apache/datafusion/issues/14247#issuecomment-2613389124

   > I am fine with DF not shipping extension types (ie no extension types 
until we add them explicitly in 
[#12644](https://github.com/apache/datafusion/issues/12644)). Let's look at the 
example. I have `Field` information for `a` and `b` columns, both are 
`DataType::Binary`. What should be the behavior of `a = b` according to DF 
core's logic? Naive answer would be that given same DataType, the expression is 
valid and Arrow's comparison function should be used. 
   
   I agree that the code as written today would compare the columns as binary 
rather than the user defined type.
   
   > However, if one (or both) of then happen to be custom-provided extension 
type (e.g. JSON, VARIANT, ST_Geometry), this logic should not be used at all, 
not even as a fallback.
   
   I also agree with this. 
    
   > So the bare minimum is -- given a Field, DF core needs to understand 
whether this is a type it knows about or a type it doesn't know about.... So 
we're almost back to explicit extension types.
   
   Here are some possible ways we could support `=` on a user defined type
   
   ## Option 1:   User defined operators
   
   In this case, we would let users override `=` similar to a UDF, implementing 
support for extension types. The users implementation could then fall back into 
the built in `=` implementation when it didn't have special rules
   
   I think this might get tricky when multiple extension types were used (it 
might be hard to hook json and geometry without a bunch of glue code)
   
   ## Option 2: Custom analyzer rules
   
   I this case the extension could add a custom ANalyzer rule that walked over 
all plan `Expr`s, finding any that were relevant to the user defined type and 
rewriting the expressions to use a function (eg. rewrite `geo_col1 = geo_col2` 
into `udf_compare_geos(geo_col1, geo_col2)`
   
   This might not be ideal as there would likely be a lot of replicated code in 
extensions (like matching on equality)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to