jkosh44 commented on PR #14289: URL: https://github.com/apache/datafusion/pull/14289#issuecomment-2625975316
There's still the open question of how window and aggregate functions should treat `NullBehavior::Propagate`. Table functions don't use the `Signature` struct, so we can ignore them for now. However, the one builtin table function re-implements the same idea: https://github.com/apache/datafusion/blob/11435ded45434edd34ed0b7738b9b9b2194f1774/datafusion/functions-table/src/generate_series.rs#L33-L40. I have a couple of proposals, @jayzhan211 let me know if you have any preferences. **Approach 1**. Window and aggregate functions ignore `NullHandling::Propagate` and maybe log a warning if it's set. **Approach 2**. Aggregate functions return a final result of `Null` if any of the input is `Null`. Window functions emit `Null` for all rows after encountering the first `Null`. **Approach 3**. Aggregate functions and window functions skip `Null` input, i.e. they don't update their state. With this approach, the name `Propagate` may not be the best and we may want to consider changing it to something more general. **Approach 4**. Remove `null_handling` from `Signature` and add a new method to the `ScalarUDFImpl` with a default implementation like ```Rust pub trait ScalarUDFImpl: Debug + Send + Sync { ... fn null_handling(&self) -> NullHandling { NullHandling::PassThrough } } ``` then we wouldn't have to worry about aggregate and window functions. Additionally, if we do ever want to add something similar to aggregate and window functions, then we can use different enums for each function type that makes sense for that function type. For example, ```Rust enum ScalarNullHandling { PassThrough, Propagate, } enum AggregateNullHandling { PassThrough, Propagate, Skip, } enum WindowNullHandling { PassThrough, Skip, } ``` Personally, I'm leaning towards approach 4. However, I didn't actually understand above why it would be less of a breaking change to update `Signature` compared to updating `ScalarUDFImpl`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org