wiedld opened a new issue, #12105:
URL: https://github.com/apache/datafusion/issues/12105

   ### Is your feature request related to a problem or challenge?
   
   With this [upstream 
change](https://github.com/influxdata/arrow-datafusion/commit/afa23abc46059e100061295b619b7b66fbc39625),
 an existing [union() 
API](https://docs.rs/datafusion-expr/41.0.0/datafusion_expr/logical_plan/builder/fn.union.html)
 had a behavioral change. This API is used to construct the UNION logical plan. 
Previously, it had been constructing the UNION plan node and performing type 
coercion. After the upstream change, it [no longer perform the type coercion at 
logical plan 
construction](https://github.com/influxdata/arrow-datafusion/blob/main/datafusion/expr/src/logical_plan/builder.rs#L1347-L1349)
 (and instead relies upon a later logical plan optimizer pass).
   
   This upstream change does not cause any failures in the upstream datafusion 
tests. However, for those constructing their own logical plans with UNION it 
does have significant downstream impacts. Explicitly:
   1. we have a UNION logical plan construction using `union()` upstream API
   2. we consume the schema & expr from the union, when deciding how to 
construct other logical plan nodes (e.g. sort, grouping, aggregates, limits)
   3. then afterwards => we hand the logical plan to datafusion for 
optimization (including type coercion)
   
   When the upstream change removed the UNION type coercion from step 1 above, 
our logical plan construction code (step 2) started creating incorrect logical 
plans. 
   
   ### Describe the solution you'd like
   
   Update the `union()` API with how to handle the changed behavior if the user 
still requires UNION type coercion at logical plan construction. Something like:
   ```
   let union_schema = coerce_union_schema(vec![prev, next])?;
   let prev = coerce_plan_expr_for_schema(prev, &union_schema)?;
   let next = coerce_plan_expr_for_schema(next, &union_schema)?;
   union(prev, next)
   ```
   
   The above code relies upon the already public 
[coerce_plan_expr_for_schema](https://docs.rs/datafusion-expr/41.0.0/datafusion_expr/expr_rewriter/fn.coerce_plan_expr_for_schema.html).
 The 
[coerce_union_schemaI](https://github.com/apache/datafusion/blob/846befb6a620d3b8c0c7ff01be7c35c45fb72360/datafusion/optimizer/src/analyzer/type_coercion.rs#L812)
 is not yet public, therefore we recommend making it public. 
   
   ### Describe alternatives you've considered
   
   The original change was put in by @jonahgao , whom may have some alternative 
ideas?
   
   ### Additional context
   
   We have made our own version of the `coerce_union_schema` as a temporary 
measure to fix the issue, and can confirm it's sufficiently. Also open to other 
ideas.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to