sarahyurick opened a new issue, #5583: URL: https://github.com/apache/arrow-datafusion/issues/5583
Hi, I was wondering if it's currently possible to change or update `OptimizerRule` fields within the `try_optimize` function? Below, I've detailed the issues I'm running into and some things I've tried that didn't work. **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** I'm currently working on an optimizer rule for the Dask-SQL project, but I'm running into some annoying logic with DataFusion's optimizer. Basically, I'd like to be able to constantly update a field of the new rule's struct within the `try_optimize` function, but because it takes in a reference to the struct instead of the struct itself ([source code](https://github.com/apache/arrow-datafusion/blob/d0bd28eef34ca001d5e43d2732761a1b4cf09c71/datafusion/optimizer/src/optimizer.rs#L53)), the field can't be updated. To give a high-level example, let's say I have an optimizer rule: ``` pub struct MyRule { my_set: HashSet<Expr>, } ``` It would be nice to be able to use `ApplyOrder::TopDown` with it, but as I'm going down the `LogicalPlan` I want to insert any filters from any `Join` in the plan to `my_set`, so that this information can later be used to update a `TableScan`. However, within the `try_optimize` function, doing something like `self.my_set.insert(...)` isn't valid because it's a `&MyRule` and not a `MyRule`. **Describe the solution you'd like** I'm not sure if there's already a way to modify these fields when desired, but from my experimentation I couldn't get it to work. One solution I imagine is to change [`&self`](https://github.com/apache/arrow-datafusion/blob/d0bd28eef34ca001d5e43d2732761a1b4cf09c71/datafusion/optimizer/src/optimizer.rs#L61) to `self`, although I'm not sure if that's desirable. **Describe alternatives you've considered** I also tried within the `MyRule` implementation to create new instances of `MyRule` with logic like: ``` fn try_optimize( &self, plan: &LogicalPlan, _config: &dyn OptimizerConfig, ) -> Result<Option<LogicalPlan>> { let my_expr = get_expr_func(&plan); let mut new_set = self.my_set.clone(); new_set.insert(my_expr); let new_self = MyRule { my_set: new_set }; // Some type of conditions, extra logic, etc. for when to recurse with try_optimize &new_self.try_optimize(plan, _config) } ``` but was seeing some wonky logic where the updates would only carry to the next step and not through the rest of the steps down the plan. However, a similar solution could be possible. I also tried something like: ``` impl MyRule { pub fn new() -> Self { Self { my_set: HashSet::new() } } pub fn set_myset(mut self, value: HashSet<Expr>) { self.my_set = value; } } impl OptimizerRule for MyRule { fn name(&self) -> &str { "my_rule" } fn apply_order(&self) -> Option<ApplyOrder> { Some(ApplyOrder::TopDown) } fn try_optimize( &self, plan: &LogicalPlan, _config: &dyn OptimizerConfig, ) -> Result<Option<LogicalPlan>> { let new_set = HashSet::from([...]); self.set_myset(new_set); } } ``` which didn't work either. **Additional context** This isn't blocking any of my work on the Dask-SQL side; it's just that currently, the workaround I have to use requires a lot more lines of code than I would imagine should be needed. @andygrove @jdye64 I wanted to know your thoughts on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
