alamb commented on issue #5781:
URL: 
https://github.com/apache/arrow-datafusion/issues/5781#issuecomment-1583105449

   While working on https://github.com/apache/arrow-datafusion/pull/6592 a key 
design question about `WindowUDF` has arisen. Specifically:
   
   # Use existing API / Traits
   
   ## What this would mean:
   
   This would mean exposing (at least)  
[`BuiltInWindowFunctionExpr`](https://github.com/apache/arrow-datafusion/blob/d689daf83f387d7cf1895b2fad6b347d216e47fc/datafusion/physical-expr/src/window/built_in_window_function_expr.rs#L31)
 and 
[`PartitionEvaluator`](https://github.com/apache/arrow-datafusion/blob/d689daf83f387d7cf1895b2fad6b347d216e47fc/datafusion/physical-expr/src/window/partition_evaluator.rs#L29)
 directly for people to implement a `WindowUDF` 
   
   This would mean making those traits `pub` and part of a WindowUDF and 
   
   ## Pros
   
   1. This is the same pattern used by 
[`AggregateUDF`](https://github.com/apache/arrow-datafusion/blob/d689daf83f387d7cf1895b2fad6b347d216e47fc/datafusion/expr/src/udaf.rs#L30-L40)
 (TODO link) that exposes the 
[`Accumulator`](https://github.com/apache/arrow-datafusion/blob/d689daf83f387d7cf1895b2fad6b347d216e47fc/datafusion/expr/src/accumulator.rs#L33)
 trait directly to implementors
   1. Allow user defined window functions to access all the features and 
performance as built in window functions (both now and in the future)
   
   ## Cons
   
   `BuiltInWindowFunctionExpr` and `PartitionEvaluator` are are fairly 
complicated (as they support many features) which could make implementing user 
defined window functions harder
   
   Also, `BuiltInWindowFunctionExpr` and `PartitionEvaluator` would likely need 
some changes such as:
   1. renamed so they weren't called "BuiltIn*"
   2. Moving the trait definitions into `datafusion_expr`
   3. Update the state management to be extendable (e.g. the enum for 
`BuiltinWindowState` would have to be updated to to allow user defined state 
somehow)
   
   
   
   # Make new APIs and traits
   
   ## What this would mean:
   
   In this case, we would make new traits that are subsets of what is in
   `BuiltInWindowFunctionExpr` and `PartitionEvaluator` that users would
   implement. Internally we would implement `BuiltInWindowFunctionExpr`
   and `PartitionEvaluator` in terms of these new traits.
   
   # Pros
   
   1. We can tailor the API to precisely the needs of user defined window 
function writers
   2. Would allow changing the internal APIs without having to change the 
WindowUDF API
   
   # Cons
   A new API would mean introducing another api layer that needs to be tested 
and kept up to date.
   
   The new API also might not expose all the functionality of the built in 
window functions if such functionality was added at a later date
   
   
   
   
   ## Discussion
   
   I am going to try and bash out a technical proof of concept of the first 
approach (exposing the existing APIs) and we can see how it would look. On the 
balance that is my preferred approach due to the power of the API and the 
similarities with `AggregateUDF`s
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to