wiedld opened a new issue, #12105: URL: https://github.com/apache/datafusion/issues/12105
### Is your feature request related to a problem or challenge? With this [upstream change](https://github.com/influxdata/arrow-datafusion/commit/afa23abc46059e100061295b619b7b66fbc39625), an existing [union() API](https://docs.rs/datafusion-expr/41.0.0/datafusion_expr/logical_plan/builder/fn.union.html) had a behavioral change. This API is used to construct the UNION logical plan. Previously, it had been constructing the UNION plan node and performing type coercion. After the upstream change, it [no longer perform the type coercion at logical plan construction](https://github.com/influxdata/arrow-datafusion/blob/main/datafusion/expr/src/logical_plan/builder.rs#L1347-L1349) (and instead relies upon a later logical plan optimizer pass). This upstream change does not cause any failures in the upstream datafusion tests. However, for those constructing their own logical plans with UNION it does have significant downstream impacts. Explicitly: 1. we have a UNION logical plan construction using `union()` upstream API 2. we consume the schema & expr from the union, when deciding how to construct other logical plan nodes (e.g. sort, grouping, aggregates, limits) 3. then afterwards => we hand the logical plan to datafusion for optimization (including type coercion) When the upstream change removed the UNION type coercion from step 1 above, our logical plan construction code (step 2) started creating incorrect logical plans. ### Describe the solution you'd like Update the `union()` API with how to handle the changed behavior if the user still requires UNION type coercion at logical plan construction. Something like: ``` let union_schema = coerce_union_schema(vec![prev, next])?; let prev = coerce_plan_expr_for_schema(prev, &union_schema)?; let next = coerce_plan_expr_for_schema(next, &union_schema)?; union(prev, next) ``` The above code relies upon the already public [coerce_plan_expr_for_schema](https://docs.rs/datafusion-expr/41.0.0/datafusion_expr/expr_rewriter/fn.coerce_plan_expr_for_schema.html). The [coerce_union_schemaI](https://github.com/apache/datafusion/blob/846befb6a620d3b8c0c7ff01be7c35c45fb72360/datafusion/optimizer/src/analyzer/type_coercion.rs#L812) is not yet public, therefore we recommend making it public. ### Describe alternatives you've considered The original change was put in by @jonahgao , whom may have some alternative ideas? ### Additional context We have made our own version of the `coerce_union_schema` as a temporary measure to fix the issue, and can confirm it's sufficiently. Also open to other ideas. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
