alamb commented on issue #5781:
URL: 
https://github.com/apache/arrow-datafusion/issues/5781#issuecomment-1581428228

   Here is a proposed design.
   
   # LogicalPlan / Expr
   
   [The current (`Expr`) representation of window 
functions](https://github.com/apache/arrow-datafusion/blob/1d3860dc813725b5a987aae4c3a8f4a7b2bfdb2d/datafusion/expr/src/expr.rs#L157):
   
   ```rust
   #[derive(Clone, PartialEq, Eq, Hash)]
   pub enum Expr {
   ...
       /// Represents the call of a window function with arguments.
       WindowFunction(WindowFunction),
   }
   ```
   
   Which is 
[defined](https://github.com/apache/arrow-datafusion/blob/1d3860dc813725b5a987aae4c3a8f4a7b2bfdb2d/datafusion/expr/src/expr.rs#L448-L458)
 like:
   
   ```rust
   /// Window function
   #[derive(Clone, PartialEq, Eq, Hash)]
   pub struct WindowFunction {
       /// Name of the function
       pub fun: window_function::WindowFunction,
       /// List of expressions to feed to the functions as arguments
       pub args: Vec<Expr>,
       /// List of partition by expressions
       pub partition_by: Vec<Expr>,
       /// List of order by expressions
       pub order_by: Vec<Expr>,
       /// Window frame
       pub window_frame: window_frame::WindowFrame,
   }
   ```
   
   Note that 
[`window_function::WindowFunction`](https://github.com/apache/arrow-datafusion/blob/1d3860dc813725b5a987aae4c3a8f4a7b2bfdb2d/datafusion/expr/src/window_function.rs#L35C26-L41)
 already handles user defined functions (`AggregateUDF`), so I propose to add 
an addutionl user defined window function variant there:
   
   ```rust
   /// WindowFunction (in `window_function`):
   #[derive(Debug, Clone, PartialEq, Eq, Hash)]
   pub enum WindowFunction {
       /// A built in aggregate function that leverages an aggregate function
       AggregateFunction(AggregateFunction),
       /// A a built-in window function
       BuiltInWindowFunction(BuiltInWindowFunction),
       /// A user defined aggregate function
       AggregateUDF(Arc<AggregateUDF>),
       /// A user defined aggregate function  <---- This is NEW
       WindowUDF(Arc<WindowUDF>),
   }
   ```
   
   # WindowUDF
   
   
   `WindowUDF` will be a structure that looks very similar to `AggregateUDF`:
   
   
https://github.com/apache/arrow-datafusion/blob/1d3860dc813725b5a987aae4c3a8f4a7b2bfdb2d/datafusion/expr/src/udaf.rs#L30-L40
   
   And similarly to the way that an `AggregateUDF` instantiates an 
[`Accumulator`](https://github.com/apache/arrow-datafusion/blob/1d3860dc813725b5a987aae4c3a8f4a7b2bfdb2d/datafusion/expr/src/accumulator.rs#L33)
 (via 
[`AccumulatorFunctionImplementation`](https://github.com/apache/arrow-datafusion/blob/1d3860dc813725b5a987aae4c3a8f4a7b2bfdb2d/datafusion/expr/src/function.rs#L49))
 the WindowUDF will need to provide a 
[`PartitionEvaluator`](https://github.com/apache/arrow-datafusion/blob/1d3860dc813725b5a987aae4c3a8f4a7b2bfdb2d/datafusion/physical-expr/src/window/partition_evaluator.rs#L29-L107)
 instance for each partition of the data.
   
   ```rust
   pub trait PartitionEvaluator: Debug + Send {
   ```
   
   I am a little unclear on certain parts of the `PartitionEvaluator` AP 
(specifically, the stateful / stateless partition evaluation) and if we can 
make it straight forward to implement this for datafusion
   
   Looking at how this code is used in the `physical_plan` module, what is 
needed is something that implements the 
[`BuiltInWindowFunctionExpr`](https://github.com/apache/arrow-datafusion/blob/1d3860dc813725b5a987aae4c3a8f4a7b2bfdb2d/datafusion/physical-expr/src/window/built_in_window_function_expr.rs#L31).
 I am not sure yet if it makes sense to have `WindowUDF` do so directly of if 
another structure would be better
   
   Here are the next steps I plan:
   
   So steps:
   - [ ] Add some additional docstrings to `PartitionEvaluator` which will both 
improve the code, and allow me to sort out some of my questions.
   - [ ] Propose renaming `BuiltInWindowFunctionExpr` to `WindowFunctionExpr` 
to reflect they will be used for more than BuiltIn
   - [ ] Propose renaming `BuiltinWindowState` to `WindowState` (in the same 
theory that it is not related to built ins)
   - [ ] Work on a POC / technical spike showing how this might work, including 
an end to end example
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to