Re: [PR] Using Union's input schema when recompute schema [datafusion]

via GitHub Mon, 13 May 2024 21:50:24 -0700


yyy1000 commented on code in PR #10494:
URL: https://github.com/apache/datafusion/pull/10494#discussion_r1599375120



##########
datafusion/optimizer/src/propagate_empty_relation.rs:
##########
@@ -154,14 +156,14 @@ impl OptimizerRule for PropagateEmptyRelation {
                         Ok(Transformed::yes(LogicalPlan::Projection(
                             Projection::new_from_schema(
                                 Arc::new(child),
-                                plan.schema().clone(),
+                                input_schema.clone(),
                             ),
                         )))
                     }
                 } else {
                     Ok(Transformed::yes(LogicalPlan::Union(Union {
                         inputs: new_inputs,
-                        schema: union.schema.clone(),
+                        schema: input_schema.clone(),

Review Comment:
   I think a solution may be, when creating the schema for a plan and for 
Alias{expr, name}, we use the expr's name rather than the `name`, however this 
would also be a big changing I think. 🥲



##########
datafusion/optimizer/src/propagate_empty_relation.rs:
##########
@@ -154,14 +156,14 @@ impl OptimizerRule for PropagateEmptyRelation {
                         Ok(Transformed::yes(LogicalPlan::Projection(
                             Projection::new_from_schema(
                                 Arc::new(child),
-                                plan.schema().clone(),
+                                input_schema.clone(),
                             ),
                         )))
                     }
                 } else {
                     Ok(Transformed::yes(LogicalPlan::Union(Union {
                         inputs: new_inputs,
-                        schema: union.schema.clone(),
+                        schema: input_schema.clone(),

Review Comment:
   From the example, after `propagate_empty_relation`, the new inputs of 
`Union` doesn't have the Alias 'a', but only `test` exists.
   If we want to use the new inputs's schema, it will be 
   `DFSchema { inner: Schema { fields: [Field { name: "col_int32", data_type: 
Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], 
metadata: {} }, field_qualifiers: [Some(Bare { table: "test" })], 
functional_dependencies: FunctionalDependencies { deps: [] } }`
   
   However, the schema before is `DFSchema { inner: Schema { fields: [Field { 
name: "col_int32", data_type: Int32, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: 
[Some(Bare { table: "a" })], functional_dependencies: FunctionalDependencies { 
deps: [] } }`
   
   The only difference is `field_qualifiers`, but `assert_schema_is_the_same` 
will throw an error if schema changed during the optimization. 🤔



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Using Union's input schema when recompute schema [datafusion]

Reply via email to