findepi commented on issue #16677:
URL: https://github.com/apache/datafusion/issues/16677#issuecomment-3092338265

   How we can solve this:
   
   ## Add two new traits
   
   ```rust
   trait UdfHash {
       fn udf_hash(&self) -> u64;
   }
   
   trait UdfEq {
       fn udf_eq(&self, other: &dyn Any) -> bool;
   }
   ```
   
   with blanket implementations
   
   ```rust
   impl<T: Hash + Any> UdfHash for T {
       fn udf_hash(&self) -> u64 {
           let hasher = &mut DefaultHasher::new();
           self.type_id().hash(hasher);
           self.hash(hasher);
           hasher.finish()
       }
   }
   
   impl<T: PartialEq + Any> UdfEq for T {
       fn udf_eq(&self, other: &dyn Any) -> bool {
           let Some(other) = other.downcast_ref::<Self>() else {
               return false;
           };
           self == other
       }
   }
   ```
   
   ## Make `ScalarUDFImpl` depend on these
   
   ```rust
   trait ScalarUDFImpl: Send + Sync + UdfHash + UdfEq + Any {
    ...
   }
   ```
   
   ## Usage
   
   ```rust
   #[derive(Debug, Clone, PartialEq, Hash)]
   struct MyFunc {
       state: SomeEquitableStateType,
   }
   
   impl ScalarUDFImpl for MyFunc {
      // nothing eq/hash related needed here
   }
   ```
   
   As shown, the end result is that `ScalarUDFImpl` can be hashed and compared 
for equality and all the implementor needs is `#[derive(PartialEq, Hash)]` to 
make it work.
   
   ## Notes
   
   ### `UdfEq` accepts any `Any`
   
   `UdfEq` accepts any `Any` so it's possible to call it with instances other 
than `&dyn ScalarUDFImpl`.
   I don't know how to avoid this.
   In particular, this is not allowed in Rust:
   ```rust
   // Error: cycle detected when computing the super predicates of 
`ScalarUDFImpl` [E0391]
   trait ScalarUDFImpl: PartialEq<dyn ScalarUDFImpl> {
    /* ... */
   }
   ```
   
   We can cover this with additional eq/hash methods directly in 
`ScalarUDFImpl` trait that will have default implementation and will not need 
to be reimplemented.
   
   ```rust
   trait ScalarUDFImpl: Send + Sync + UdfHash + UdfEq + Any {
       /* ... */
       
       fn eq(&self, other: &dyn ScalarUDFImpl) -> bool {
           self.udf_eq(other)
       }
       
       fn hash(&self) -> u64 {
           self.udf_hash()
       }
   }
   ```
   
   
   ### Compiler errors don't make it obvious
   
   As with any trait with blanket implementations (such as `Into`), when you 
don't implement it yet, the compiler won't tell you how you should implement.
   For example, if a function lacks `PartialEq` implementation, the compiler 
will complain about `UdfEq` instead.
   
   ```rust
   // Missing eq implementation
   #[derive(Debug, Clone, /*PartialEq,*/ Hash)]
   struct MyFunc {
       safe: bool,
   }
   
   // Error: the trait bound `MyFunc: UdfEq` is not satisfied
   impl ScalarUDFImpl for MyFunc {
       /* ... */
   }
   ```
   
   It can be mitigated by a good documentation for the `UdfEq`, `UdfHash` 
traits informing about blanket implementation and that they should not be 
implemented directly.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to