sarahyurick opened a new issue, #5583:
URL: https://github.com/apache/arrow-datafusion/issues/5583

   Hi, I was wondering if it's currently possible to change or update 
`OptimizerRule` fields within the `try_optimize` function? Below, I've detailed 
the issues I'm running into and some things I've tried that didn't work.
   
   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   I'm currently working on an optimizer rule for the Dask-SQL project, but I'm 
running into some annoying logic with DataFusion's optimizer. Basically, I'd 
like to be able to constantly update a field of the new rule's struct within 
the `try_optimize` function, but because it takes in a reference to the struct 
instead of the struct itself ([source 
code](https://github.com/apache/arrow-datafusion/blob/d0bd28eef34ca001d5e43d2732761a1b4cf09c71/datafusion/optimizer/src/optimizer.rs#L53)),
 the field can't be updated. To give a high-level example, let's say I have an 
optimizer rule:
   
   ```
   pub struct MyRule {
       my_set: HashSet<Expr>,
   }
   ```
   
   It would be nice to be able to use `ApplyOrder::TopDown` with it, but as I'm 
going down the `LogicalPlan` I want to insert any filters from any `Join` in 
the plan to `my_set`, so that this information can later be used to update a 
`TableScan`. However, within the `try_optimize` function, doing something like 
`self.my_set.insert(...)` isn't valid because it's a `&MyRule` and not a 
`MyRule`.
   
   **Describe the solution you'd like**
   I'm not sure if there's already a way to modify these fields when desired, 
but from my experimentation I couldn't get it to work. One solution I imagine 
is to change 
[`&self`](https://github.com/apache/arrow-datafusion/blob/d0bd28eef34ca001d5e43d2732761a1b4cf09c71/datafusion/optimizer/src/optimizer.rs#L61)
 to `self`, although I'm not sure if that's desirable.
   
   **Describe alternatives you've considered**
   I also tried within the `MyRule` implementation to create new instances of 
`MyRule` with logic like:
   
   ```
   fn try_optimize(
           &self,
           plan: &LogicalPlan,
           _config: &dyn OptimizerConfig,
       ) -> Result<Option<LogicalPlan>> {
           let my_expr = get_expr_func(&plan);
           let mut new_set = self.my_set.clone();
           new_set.insert(my_expr);
           let new_self = MyRule { my_set: new_set };
           // Some type of conditions, extra logic, etc. for when to recurse 
with try_optimize
           &new_self.try_optimize(plan, _config)
   }
   ```
   
   but was seeing some wonky logic where the updates would only carry to the 
next step and not through the rest of the steps down the plan. However, a 
similar solution could be possible.
   
   I also tried something like:
   
   ```
   impl MyRule {
       pub fn new() -> Self {
           Self { my_set: HashSet::new() }
       }
       
       pub fn set_myset(mut self, value: HashSet<Expr>) {
           self.my_set = value;
       }
   }
   
   impl OptimizerRule for MyRule {
       fn name(&self) -> &str {
           "my_rule"
       }
   
       fn apply_order(&self) -> Option<ApplyOrder> {
           Some(ApplyOrder::TopDown)
       }
   
       fn try_optimize(
           &self,
           plan: &LogicalPlan,
           _config: &dyn OptimizerConfig,
       ) -> Result<Option<LogicalPlan>> {
           let new_set = HashSet::from([...]);
           self.set_myset(new_set);
       }
   }
   ```
   
   which didn't work either.
   
   **Additional context**
   This isn't blocking any of my work on the Dask-SQL side; it's just that 
currently, the workaround I have to use requires a lot more lines of code than 
I would imagine should be needed.
   
   @andygrove @jdye64 I wanted to know your thoughts on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to