findepi commented on issue #14247:
URL: https://github.com/apache/datafusion/issues/14247#issuecomment-2613408823
Yes, once plan is lowered into "container" arrow types (like assembly), we
no longer need to remember what were the logical/extension types. Before the
lowering happens, the functions and operators need to be resolved. This doesn't
happen at the Expr construction time, though, so it IMO calls for a strict
separation of phases:
1. Exprs are syntactical (eg created directly from SQL syntax or dataframe
API).
2. Then analyzer needs to "resolve" operators. This needs to be type aware.
E.g. for builtin types it can use `=` operators, but for other it needs to be
told how to compare comparisons (and such), so a UDF gets inserted into the
plan.
- After this phase, the plan is "resolved" and doesn't need to remember
the original types (except maybe for output fields metadata).
> I think this might get tricky when multiple extension types were used (it
might be hard to hook json and geometry without a bunch of glue code)
Extensible coercion rules is a tricky thing indeed. Maybe we can leave
without them (for now)
But there are simpler thing to solve as well, like casts: If "my JSON" type
uses DataType::Binary as its container type, it still wants to define its own
family of casts to various other types (numbers, text, etc.). So the Cast Expr
would need to resolve to some UDF, when source type or target type are not
native types.
> finding any that were relevant to the user defined type and rewriting the
expressions to use a function (eg. rewrite `geo_col1 = geo_col2` into
`udf_compare_geos(geo_col1, geo_col2)`
That sounds easy because we don't have to write this logic even once.
But once such logic is written somewhere, there is no reason for it not to
be part of datafusion project, for the benefit of all consumers. I think such
logic should belong to datafusion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]