[GitHub] [arrow-datafusion] wolffcm commented on a diff in pull request #6077: fix: make simplify_expressions use a single schema for resolution

via GitHub Fri, 21 Apr 2023 14:57:53 -0700


wolffcm commented on code in PR #6077:
URL: https://github.com/apache/arrow-datafusion/pull/6077#discussion_r1174187518



##########
datafusion/optimizer/src/simplify_expressions/simplify_exprs.rs:
##########
@@ -61,16 +63,14 @@ impl SimplifyExpressions {
         plan: &LogicalPlan,
         execution_props: &ExecutionProps,
     ) -> Result<LogicalPlan> {
-        // Pass down the `children merge schema` and `plan schema` to evaluate 
expression types.
-        // pass all `child schema` and `plan schema` isn't enough, because 
like `t1 semi join t2 on
-        // on t1.id = t2.id`, each individual schema can't contain all the 
columns in it.
-        let children_merge_schema = 
DFSchemaRef::new(merge_schema(plan.inputs()));
-        let schemas = vec![plan.schema(), &children_merge_schema];
-        let info = schemas
-            .into_iter()
-            .fold(SimplifyContext::new(execution_props), |context, schema| {
-                context.with_schema(schema.clone())
-            });
+        let schema = if plan.inputs().is_empty() {
+            // When predicates are pushed into a table scan, there needs to be

Review Comment:
   If I made this so it used the empty schema whenever the plan node had no 
inputs, the test `csv_query_group_by_and_having_and_where` would fail, since it 
attempts to simplify predicates that are inlined into a scan:
   
https://github.com/apache/arrow-datafusion/blob/caa60337c7a57572d93d8bd3cbc18006aabe55e6/datafusion/expr/src/logical_plan/plan.rs#L1428-L1429
   
   It seems to be that inlined scan filters are a bit of an exception in that 
they are evaluated on top of the scan itself.
   
   For other kinds of node (e.g., `Values`) I think you're right though, so I 
pushed a commit that refines the logic a bit more to use the plan's schema only 
for table scans.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] wolffcm commented on a diff in pull request #6077: fix: make simplify_expressions use a single schema for resolution

Reply via email to