jkosh44 commented on PR #14289:
URL: https://github.com/apache/datafusion/pull/14289#issuecomment-2625975316

   There's still the open question of how window and aggregate functions should 
treat `NullBehavior::Propagate`. Table functions don't use the `Signature` 
struct, so we can ignore them for now. However, the one builtin table function 
re-implements the same idea:  
https://github.com/apache/datafusion/blob/11435ded45434edd34ed0b7738b9b9b2194f1774/datafusion/functions-table/src/generate_series.rs#L33-L40.
 I have a couple of proposals, @jayzhan211 let me know if you have any 
preferences.
   
   **Approach 1**. Window and aggregate functions ignore 
`NullHandling::Propagate` and maybe log a warning if it's set.
   
   **Approach 2**. Aggregate functions return a final result of `Null` if any 
of the input is `Null`. Window functions emit `Null` for all rows after 
encountering the first `Null`.
   
   **Approach 3**. Aggregate functions and window functions skip `Null` input, 
i.e. they don't update their state. With this approach, the name `Propagate` 
may not be the best and we may want to consider changing it to something more 
general.
   
   **Approach 4**. Remove `null_handling` from `Signature` and add a new method 
to the `ScalarUDFImpl` with a default implementation like
   ```Rust
   pub trait ScalarUDFImpl: Debug + Send + Sync {
       ...
   
       fn null_handling(&self) -> NullHandling {
           NullHandling::PassThrough
       } 
   }
   ```
   then we wouldn't have to worry about aggregate and window functions. 
Additionally, if we do ever want to add something similar to aggregate and 
window functions, then we can use different enums for each function type that 
makes sense for that function type. For example,
   ```Rust
   enum ScalarNullHandling  {
       PassThrough,
       Propagate,
   }
   enum AggregateNullHandling {
       PassThrough,
       Propagate,
       Skip,
   }
   enum WindowNullHandling {
       PassThrough,
       Skip,
   }
   ```
   
   Personally, I'm leaning towards approach 4. However, I didn't actually 
understand above why it would be less of a breaking change to update 
`Signature` compared to updating `ScalarUDFImpl`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to