This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/main by this push:
     new fdd36d0d21 Update comments on OptimizerRule about function name 
matching (#20346)
fdd36d0d21 is described below

commit fdd36d0d2198d1701fffdb02901b20e0611b5468
Author: Andrew Lamb <[email protected]>
AuthorDate: Tue Feb 24 14:46:32 2026 -0500

    Update comments on OptimizerRule about function name matching (#20346)
    
    ## Which issue does this PR close?
    
    - Related to  https://github.com/apache/datafusion/pull/20180
    
    
    
    ## Rationale for this change
    
    I gave feedback to @devanshu0987
    https://github.com/apache/datafusion/pull/20180/changes#r2800720037 that
    it was not a good idea to check for function names in optimizer rules,
    but then I realized that the rationale for this is not written down
    anywhere.
    
    ## What changes are included in this PR?
    
    Document why checking for function names in optimizer rules is not good
    and offer alternatives
    
    ## Are these changes tested?
    
    By CI
    
    ## Are there any user-facing changes?
    
    Just docs, no functional changes
---
 datafusion/optimizer/src/optimizer.rs | 46 +++++++++++++++++++++++++++++------
 1 file changed, 39 insertions(+), 7 deletions(-)

diff --git a/datafusion/optimizer/src/optimizer.rs 
b/datafusion/optimizer/src/optimizer.rs
index 0da4d6352a..bdea6a8307 100644
--- a/datafusion/optimizer/src/optimizer.rs
+++ b/datafusion/optimizer/src/optimizer.rs
@@ -58,12 +58,12 @@ use crate::simplify_expressions::SimplifyExpressions;
 use crate::single_distinct_to_groupby::SingleDistinctToGroupBy;
 use crate::utils::log_plan;
 
-/// `OptimizerRule`s transforms one [`LogicalPlan`] into another which
-/// computes the same results, but in a potentially more efficient
-/// way. If there are no suitable transformations for the input plan,
-/// the optimizer should simply return it unmodified.
+/// Transforms one [`LogicalPlan`] into another which computes the same 
results,
+/// but in a potentially more efficient way.
 ///
-/// To change the semantics of a `LogicalPlan`, see [`AnalyzerRule`]
+/// See notes on [`Self::rewrite`] for details on how to implement an 
`OptimizerRule`.
+///
+/// To change the semantics of a `LogicalPlan`, see [`AnalyzerRule`].
 ///
 /// Use [`SessionState::add_optimizer_rule`] to register additional
 /// `OptimizerRule`s.
@@ -88,8 +88,40 @@ pub trait OptimizerRule: Debug {
         true
     }
 
-    /// Try to rewrite `plan` to an optimized form, returning 
`Transformed::yes`
-    /// if the plan was rewritten and `Transformed::no` if it was not.
+    /// Try to rewrite `plan` to an optimized form, returning 
[`Transformed::yes`]
+    /// if the plan was rewritten and [`Transformed::no`] if it was not.
+    ///
+    /// # Notes for implementations:
+    ///
+    /// ## Return the same plan if no changes were made
+    ///
+    /// If there are no suitable transformations for the input plan,
+    /// the optimizer should simply return it unmodified.
+    ///
+    /// The optimizer will call `rewrite` several times until a fixed point is
+    /// reached, so it is important that `rewrite` return [`Transformed::no`] 
if
+    /// the output is the same.
+    ///
+    /// ## Matching on functions
+    ///
+    /// The rule should avoid function-specific transformations, and instead 
use
+    /// methods on [`ScalarUDFImpl`] and [`AggregateUDFImpl`]. Specifically, 
the
+    /// rule should not check function names as functions can be overridden, 
and
+    /// may not have the same semantics as the functions provided with
+    /// DataFusion.
+    ///
+    /// For example, if a rule rewrites a function based on the check
+    /// `func.name() == "sum"`, it may rewrite the plan incorrectly if the
+    /// registered `sum` function has different semantics (for example, the
+    /// `sum` function from the `datafusion-spark` crate).
+    ///
+    /// There are still several cases that rely on function name checking in
+    /// the rules included with DataFusion. Please see [#18643] for more 
details
+    /// and to help remove these cases.
+    ///
+    /// [`ScalarUDFImpl`]: datafusion_expr::ScalarUDFImpl
+    /// [`AggregateUDFImpl`]: datafusion_expr::ScalarUDFImpl
+    /// [#18643]: https://github.com/apache/datafusion/issues/18643
     fn rewrite(
         &self,
         _plan: LogicalPlan,


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to