Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

via GitHub Sun, 06 Jul 2025 19:09:29 -0700


kosiew commented on code in PR #16681:
URL: https://github.com/apache/datafusion/pull/16681#discussion_r2188826793



##########
docs/source/library-user-guide/upgrading.md:
##########
@@ -62,6 +62,36 @@ DataFusionError::SchemaError(
 
 [#16652]: https://github.com/apache/datafusion/issues/16652
 
+#### `ScalarUDFImpl::equals` Default Implementation
+
+The default implementation of the `equals` method in the `ScalarUDFImpl` trait 
has been updated. Previously, it compared only the type IDs, names, and 
signatures of UDFs. Now, it assumes UDFs are not equal unless their pointers 
are the same.

Review Comment:
   That's a thoughtful consideration. You raise valid points about 
predictability vs. convenience. Let me outline the tradeoffs:
   
   Current approach (safe default with pointer comparison):
   
   ✅ Prevents silent bugs where ScalarUDFs with different internal state are 
incorrectly considered equal
   ✅ Forces developers to think about equality semantics for their specific UDFs
   ✅ Common expression elimination works correctly out of the box
   ❌ Requires manual action from users who want structural equality
   
   Alternative approach (require explicit PartialEq implementation):
   
   ✅ More explicit and predictable - no hidden behavior changes
   ✅ Leverages Rust's type system for compile-time guarantees
   ✅ Clear contract: if you want equality comparison, implement PartialEq
   ❌ Breaking change requiring immediate action from all ScalarUDF implementors
   ❌ More work for simple ScalarUDFs that don't need custom equality logic
   
   My recommendation: Stick with the current approach for DataFusion 49.0.0, 
but consider the PartialEq approach for a future major version. Here's why:
   
   The current change already enables critical functionality (common expression 
elimination) without breaking compilation
   The upgrade path is clear and documented with examples
   For DataFusion 50.0.0 or 51.0.0, we could introduce a more explicit API that 
requires PartialEq implementation
   This gives users time to adapt while ensuring the optimizer works correctly 
immediately. 
   
   What do you think about this phased approach?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

Reply via email to