alamb commented on issue #5781: URL: https://github.com/apache/arrow-datafusion/issues/5781#issuecomment-1583105449
While working on https://github.com/apache/arrow-datafusion/pull/6592 a key design question about `WindowUDF` has arisen. Specifically: # Use existing API / Traits ## What this would mean: This would mean exposing (at least) [`BuiltInWindowFunctionExpr`](https://github.com/apache/arrow-datafusion/blob/d689daf83f387d7cf1895b2fad6b347d216e47fc/datafusion/physical-expr/src/window/built_in_window_function_expr.rs#L31) and [`PartitionEvaluator`](https://github.com/apache/arrow-datafusion/blob/d689daf83f387d7cf1895b2fad6b347d216e47fc/datafusion/physical-expr/src/window/partition_evaluator.rs#L29) directly for people to implement a `WindowUDF` This would mean making those traits `pub` and part of a WindowUDF and ## Pros 1. This is the same pattern used by [`AggregateUDF`](https://github.com/apache/arrow-datafusion/blob/d689daf83f387d7cf1895b2fad6b347d216e47fc/datafusion/expr/src/udaf.rs#L30-L40) (TODO link) that exposes the [`Accumulator`](https://github.com/apache/arrow-datafusion/blob/d689daf83f387d7cf1895b2fad6b347d216e47fc/datafusion/expr/src/accumulator.rs#L33) trait directly to implementors 1. Allow user defined window functions to access all the features and performance as built in window functions (both now and in the future) ## Cons `BuiltInWindowFunctionExpr` and `PartitionEvaluator` are are fairly complicated (as they support many features) which could make implementing user defined window functions harder Also, `BuiltInWindowFunctionExpr` and `PartitionEvaluator` would likely need some changes such as: 1. renamed so they weren't called "BuiltIn*" 2. Moving the trait definitions into `datafusion_expr` 3. Update the state management to be extendable (e.g. the enum for `BuiltinWindowState` would have to be updated to to allow user defined state somehow) # Make new APIs and traits ## What this would mean: In this case, we would make new traits that are subsets of what is in `BuiltInWindowFunctionExpr` and `PartitionEvaluator` that users would implement. Internally we would implement `BuiltInWindowFunctionExpr` and `PartitionEvaluator` in terms of these new traits. # Pros 1. We can tailor the API to precisely the needs of user defined window function writers 2. Would allow changing the internal APIs without having to change the WindowUDF API # Cons A new API would mean introducing another api layer that needs to be tested and kept up to date. The new API also might not expose all the functionality of the built in window functions if such functionality was added at a later date ## Discussion I am going to try and bash out a technical proof of concept of the first approach (exposing the existing APIs) and we can see how it would look. On the balance that is my preferred approach due to the power of the API and the similarities with `AggregateUDF`s -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
