alamb commented on code in PR #5623:
URL: https://github.com/apache/datafusion/pull/5623#discussion_r1666983056
##########
datafusion/optimizer/src/optimizer.rs:
##########
@@ -330,15 +324,14 @@ impl Optimizer {
}
log_plan(&format!("Optimized plan (pass {i})"), &new_plan);
- // TODO this is an expensive way to see if the optimizer did
anything and
- // it would be better to change the OptimizerRule trait to return
an Option
- // instead
- if old_plan.as_ref() == &new_plan {
+ // HashSet::insert returns, whether the value was newly inserted.
+ let plan_is_fresh =
+ previous_plans.insert(LogicalPlanSignature::new(&new_plan));
+ if !plan_is_fresh {
Review Comment:
Thank you for bringing this up @mslapek and @findepi
> Suboptimal plan can be orders of magnitude more expensive to execute, so
allowing it to run may cause unavailability for others, but I see your point.
It's especially difficult to transition from bug lenient treatment to more
strict. It should be gradual. I like the idea of having this initially
controlled with a flag. In tests and on CI this flag should be set to "fail".
Then we can switch the flag on for runtime as well.
My thoughts are that I can see the tradeoffs with both behavior:
1. Fail fast (raise an error if cycle is detected) that @findepi increases
the liklihood that such an error would be found and fixed quickly (as it would
prevent the query in question from running)
2. Lenient (ignore error) user still gets their answer (though maybe woudl
take much longer)
One middle ground might be as @findepi suggests and use a flag -- we could
default to raising an error if a cycle was detected but have a way for users to
ignore the error and proceed anyways. As long as the error message said how to
work around the error I think it would be fine.
In fact we have a similar setting already for failed optimizer rules, for
many of the same reasons discussed, that we could model the behavior on
`datafusion.optimizer.skip_failed_rules`:
https://datafusion.apache.org/user-guide/configs.html
datafusion.optimizer.skip_failed_rules | false | When set to true, the
logical plan optimizer will produce warning messages if any optimization rules
produce errors and then proceed to the next rule. When set to false, any rules
that produce errors will cause the query to fail
-- | -- | --
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]