alamb commented on issue #19049:
URL: https://github.com/apache/datafusion/issues/19049#issuecomment-3643344588

   > It's not clear to me whether or not this is an actual bug. It seems 
reasonable to expect metadata to be consistent for field names across union 
branches. However it could also be problematic for queries that either:
   
   In my mind it is a bug because the query is very reasonable -- I think there 
are other types of plans in queries where the metadata from different inputs 
needs to be combined (and thus might not be the same at the input and output -- 
for example joins and aggregates)
   
   It seems like there is an implicit assumption that  "the schema (including 
metadata) of a plan should remain the same after an optimizer pass". If this is 
indeed correct, then by your analysis above
   
   > 
[optimize_projections](https://github.com/apache/datafusion/blob/main/datafusion/optimizer/src/optimize_projections/mod.rs)
 
[calls](https://github.com/apache/datafusion/blob/7b4593f36e880ca1c43746d5c4465fff5a3901c3/datafusion/optimizer/src/optimize_projections/mod.rs#L468)
 
[recompute_schema](https://github.com/apache/arrow-datafusion/blob/7b4593f36e880ca1c43746d5c4465fff5a3901c3/datafusion/expr/src/logical_plan/plan.rs#L624-L756)
 since the plan has changed. recompute_schema sees that the number of fields 
has changed and [creates a new Union 
node](https://github.com/influxdata/arrow-datafusion/blob/82cd7f3cdb8dbe0b63b8b62f54543641598655a0/datafusion/expr/src/logical_plan/plan.rs#L718)
 with 
[Union::try_new](https://github.com/influxdata/arrow-datafusion/blob/82cd7f3cdb8dbe0b63b8b62f54543641598655a0/datafusion/expr/src/logical_plan/plan.rs#L718)
   
   
   It seems like we should fix optimize_projections so it maintains the schema 
(either by attaching metadata to the NULL literal, or perhaps by simply reusing 
the previous schema).
   
   For example, @adriangb  just aded code that does something similar (though 
during execution) in this PR https://github.com/apache/datafusion/pull/19111 : 
   
   
https://github.com/apache/datafusion/blob/bde16083ad344b7a52db5cb298a15d7434ffde51/datafusion/datasource-parquet/src/opener.rs#L529-L545


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to