2010YOUY01 commented on PR #20086:
URL: https://github.com/apache/datafusion/pull/20086#issuecomment-3857890686

   Thank you for this POC; it helps clarify the issue.
   
   I see two benefits to unifying operators and UDFs:
   1. Optimizer rules would no longer need to differentiate between expressions 
and UDFs, which could simplify many optimizer implementations.
   2. A unified Operator/UDF API: as described in 
https://github.com/apache/datafusion/issues/20018, we already have good 
signature checking and user-friendly error message utilities for UDFs, but not 
for operators.
   
   This PR’s approach is to replace operator inside `Expr` with UDFs during the 
`SQL -> Logical Plan` stage. Conceptually, I think this is the ideal way 
operator expr should have been implemented from the start. However, doing this 
now would require migrating all related optimizer implementations for that 
operator. Based on some code searching, I believe this is effectively 
infeasible: changes that involve deep optimizer refactoring are particularly 
hard to review, and at the moment there likely isn’t enough review capacity for 
such a large and complex change.
   
   As a middle ground, I have another idea: operator `struct`s that implement 
`PhysicalExpr` could be thin wrappers around `ScalarUDF`, for example:
   
   ```rust
   #[derive(Debug, Clone, Eq)]
   pub struct BinaryExpr {
       left: Arc<dyn PhysicalExpr>,
       op: Operator,
       right: Arc<dyn PhysicalExpr>,
       /// Specifies whether an error is returned on overflow or not
       fail_on_overflow: bool,
       inner: Arc<dyn ScalarUDF>,
   }
   ```
   
   Here, `inner` would handle the heavy lifting such as signature checking and 
execution, while the outer `BinaryExpr` would mainly delegate to `inner` to 
implement `PhysicalExpr`.
   
   This way, operators and UDFs could share a consistent interface and reuse 
utilities like signature checking and user-friendly error messages. I think 
this approach is feasible, though I may be underestimating the difficulty since 
I haven’t worked extensively in these modules; I believe @Jefffrey has a deeper 
understanding here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to