2010YOUY01 commented on PR #20086: URL: https://github.com/apache/datafusion/pull/20086#issuecomment-3857890686
Thank you for this POC; it helps clarify the issue. I see two benefits to unifying operators and UDFs: 1. Optimizer rules would no longer need to differentiate between expressions and UDFs, which could simplify many optimizer implementations. 2. A unified Operator/UDF API: as described in https://github.com/apache/datafusion/issues/20018, we already have good signature checking and user-friendly error message utilities for UDFs, but not for operators. This PR’s approach is to replace operator inside `Expr` with UDFs during the `SQL -> Logical Plan` stage. Conceptually, I think this is the ideal way operator expr should have been implemented from the start. However, doing this now would require migrating all related optimizer implementations for that operator. Based on some code searching, I believe this is effectively infeasible: changes that involve deep optimizer refactoring are particularly hard to review, and at the moment there likely isn’t enough review capacity for such a large and complex change. As a middle ground, I have another idea: operator `struct`s that implement `PhysicalExpr` could be thin wrappers around `ScalarUDF`, for example: ```rust #[derive(Debug, Clone, Eq)] pub struct BinaryExpr { left: Arc<dyn PhysicalExpr>, op: Operator, right: Arc<dyn PhysicalExpr>, /// Specifies whether an error is returned on overflow or not fail_on_overflow: bool, inner: Arc<dyn ScalarUDF>, } ``` Here, `inner` would handle the heavy lifting such as signature checking and execution, while the outer `BinaryExpr` would mainly delegate to `inner` to implement `PhysicalExpr`. This way, operators and UDFs could share a consistent interface and reuse utilities like signature checking and user-friendly error messages. I think this approach is feasible, though I may be underestimating the difficulty since I haven’t worked extensively in these modules; I believe @Jefffrey has a deeper understanding here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
