findepi opened a new issue, #14357:
URL: https://github.com/apache/datafusion/issues/14357
The basic assumption that for a given operator we can recompute its schema
from inputs' schema is unsound.
- metadata: for plans constructed from SQL metadata will usually be empty,
but an application can attach additional metadata to schema or field. The
metadata can be assigned on the relational operator (its schema or one of the
fields) and may not be derivable from inputs.
- for examples of metadata ussage see
https://github.com/apache/datafusion/issues/14247,
https://github.com/apache/datafusion/issues/12644, but also other, non-type
related use-cases, like primary ID tracking
- field qualification: a plan node may have field qualification retained
from inputs or erased, or reassigned. At the optimizer time, we cannot simply
assume one way or the other.
- DataFusion deals with plans created by it's own frontend, but DataFusion
is also a library. It also deals with plans constructed by other frontends
(https://github.com/apache/datafusion/issues/12723). Optimizers need to take
any valid plan and produce a valid plan.
The usage of `recompute_schema` within optimizer should be replaced with
explicit node schema updates.
For example, when pruning inputs with `RequiredIndices`, the node's schema
should be pruned the same way, not recomputed anew.
The usage of `recompute_schema` within analyzer is left for a different
issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]