wolffcm commented on code in PR #6077:
URL: https://github.com/apache/arrow-datafusion/pull/6077#discussion_r1174187518
##########
datafusion/optimizer/src/simplify_expressions/simplify_exprs.rs:
##########
@@ -61,16 +63,14 @@ impl SimplifyExpressions {
plan: &LogicalPlan,
execution_props: &ExecutionProps,
) -> Result<LogicalPlan> {
- // Pass down the `children merge schema` and `plan schema` to evaluate
expression types.
- // pass all `child schema` and `plan schema` isn't enough, because
like `t1 semi join t2 on
- // on t1.id = t2.id`, each individual schema can't contain all the
columns in it.
- let children_merge_schema =
DFSchemaRef::new(merge_schema(plan.inputs()));
- let schemas = vec![plan.schema(), &children_merge_schema];
- let info = schemas
- .into_iter()
- .fold(SimplifyContext::new(execution_props), |context, schema| {
- context.with_schema(schema.clone())
- });
+ let schema = if plan.inputs().is_empty() {
+ // When predicates are pushed into a table scan, there needs to be
Review Comment:
If I made this so it used the empty schema whenever the plan node had no
inputs, the test `csv_query_group_by_and_having_and_where` would fail, since it
attempts to simplify predicates that are inlined into a scan:
https://github.com/apache/arrow-datafusion/blob/caa60337c7a57572d93d8bd3cbc18006aabe55e6/datafusion/expr/src/logical_plan/plan.rs#L1428-L1429
It seems to be that inlined scan filters are a bit of an exception in that
they are evaluated on top of the scan itself.
For other kinds of node (e.g., `Values`) I think you're right though, so I
pushed a commit that refines the logic a bit more to use the plan's schema only
for table scans.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]