alamb commented on issue #5781: URL: https://github.com/apache/arrow-datafusion/issues/5781#issuecomment-1581428228
Here is a proposed design. # LogicalPlan / Expr [The current (`Expr`) representation of window functions](https://github.com/apache/arrow-datafusion/blob/1d3860dc813725b5a987aae4c3a8f4a7b2bfdb2d/datafusion/expr/src/expr.rs#L157): ```rust #[derive(Clone, PartialEq, Eq, Hash)] pub enum Expr { ... /// Represents the call of a window function with arguments. WindowFunction(WindowFunction), } ``` Which is [defined](https://github.com/apache/arrow-datafusion/blob/1d3860dc813725b5a987aae4c3a8f4a7b2bfdb2d/datafusion/expr/src/expr.rs#L448-L458) like: ```rust /// Window function #[derive(Clone, PartialEq, Eq, Hash)] pub struct WindowFunction { /// Name of the function pub fun: window_function::WindowFunction, /// List of expressions to feed to the functions as arguments pub args: Vec<Expr>, /// List of partition by expressions pub partition_by: Vec<Expr>, /// List of order by expressions pub order_by: Vec<Expr>, /// Window frame pub window_frame: window_frame::WindowFrame, } ``` Note that [`window_function::WindowFunction`](https://github.com/apache/arrow-datafusion/blob/1d3860dc813725b5a987aae4c3a8f4a7b2bfdb2d/datafusion/expr/src/window_function.rs#L35C26-L41) already handles user defined functions (`AggregateUDF`), so I propose to add an addutionl user defined window function variant there: ```rust /// WindowFunction (in `window_function`): #[derive(Debug, Clone, PartialEq, Eq, Hash)] pub enum WindowFunction { /// A built in aggregate function that leverages an aggregate function AggregateFunction(AggregateFunction), /// A a built-in window function BuiltInWindowFunction(BuiltInWindowFunction), /// A user defined aggregate function AggregateUDF(Arc<AggregateUDF>), /// A user defined aggregate function <---- This is NEW WindowUDF(Arc<WindowUDF>), } ``` # WindowUDF `WindowUDF` will be a structure that looks very similar to `AggregateUDF`: https://github.com/apache/arrow-datafusion/blob/1d3860dc813725b5a987aae4c3a8f4a7b2bfdb2d/datafusion/expr/src/udaf.rs#L30-L40 And similarly to the way that an `AggregateUDF` instantiates an [`Accumulator`](https://github.com/apache/arrow-datafusion/blob/1d3860dc813725b5a987aae4c3a8f4a7b2bfdb2d/datafusion/expr/src/accumulator.rs#L33) (via [`AccumulatorFunctionImplementation`](https://github.com/apache/arrow-datafusion/blob/1d3860dc813725b5a987aae4c3a8f4a7b2bfdb2d/datafusion/expr/src/function.rs#L49)) the WindowUDF will need to provide a [`PartitionEvaluator`](https://github.com/apache/arrow-datafusion/blob/1d3860dc813725b5a987aae4c3a8f4a7b2bfdb2d/datafusion/physical-expr/src/window/partition_evaluator.rs#L29-L107) instance for each partition of the data. ```rust pub trait PartitionEvaluator: Debug + Send { ``` I am a little unclear on certain parts of the `PartitionEvaluator` AP (specifically, the stateful / stateless partition evaluation) and if we can make it straight forward to implement this for datafusion Looking at how this code is used in the `physical_plan` module, what is needed is something that implements the [`BuiltInWindowFunctionExpr`](https://github.com/apache/arrow-datafusion/blob/1d3860dc813725b5a987aae4c3a8f4a7b2bfdb2d/datafusion/physical-expr/src/window/built_in_window_function_expr.rs#L31). I am not sure yet if it makes sense to have `WindowUDF` do so directly of if another structure would be better Here are the next steps I plan: So steps: - [ ] Add some additional docstrings to `PartitionEvaluator` which will both improve the code, and allow me to sort out some of my questions. - [ ] Propose renaming `BuiltInWindowFunctionExpr` to `WindowFunctionExpr` to reflect they will be used for more than BuiltIn - [ ] Propose renaming `BuiltinWindowState` to `WindowState` (in the same theory that it is not related to built ins) - [ ] Work on a POC / technical spike showing how this might work, including an end to end example -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
