kosiew commented on code in PR #16681: URL: https://github.com/apache/datafusion/pull/16681#discussion_r2188826793
########## docs/source/library-user-guide/upgrading.md: ########## @@ -62,6 +62,36 @@ DataFusionError::SchemaError( [#16652]: https://github.com/apache/datafusion/issues/16652 +#### `ScalarUDFImpl::equals` Default Implementation + +The default implementation of the `equals` method in the `ScalarUDFImpl` trait has been updated. Previously, it compared only the type IDs, names, and signatures of UDFs. Now, it assumes UDFs are not equal unless their pointers are the same. Review Comment: That's a thoughtful consideration. You raise valid points about predictability vs. convenience. Let me outline the tradeoffs: Current approach (safe default with pointer comparison): ✅ Prevents silent bugs where ScalarUDFs with different internal state are incorrectly considered equal ✅ Forces developers to think about equality semantics for their specific UDFs ✅ Common expression elimination works correctly out of the box ❌ Requires manual action from users who want structural equality Alternative approach (require explicit PartialEq implementation): ✅ More explicit and predictable - no hidden behavior changes ✅ Leverages Rust's type system for compile-time guarantees ✅ Clear contract: if you want equality comparison, implement PartialEq ❌ Breaking change requiring immediate action from all ScalarUDF implementors ❌ More work for simple ScalarUDFs that don't need custom equality logic My recommendation: Stick with the current approach for DataFusion 49.0.0, but consider the PartialEq approach for a future major version. Here's why: The current change already enables critical functionality (common expression elimination) without breaking compilation The upgrade path is clear and documented with examples For DataFusion 50.0.0 or 51.0.0, we could introduce a more explicit API that requires PartialEq implementation This gives users time to adapt while ensuring the optimizer works correctly immediately. What do you think about this phased approach? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org