zembunia opened a new pull request, #4155: URL: https://github.com/apache/arrow-datafusion/pull/4155
# Which issue does this PR close? This PR provides the support of the `GROUPS` mode in the window frames, which was a missing item in #3570 enhancement. The `GROUPS` mode is implemented regarding the specification in [PostgreSQL window function calls](https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS). # Rationale for this change This change is part of an enhancement #361 that is on the roadmap. # What changes are included in this PR? The common single method calculating the window range (calculate_range) is removed from the window_expr. New structs that can hold any state information for each window frame mode are introduced. The `ROWS` mode does not require a state as it is simple row index calculation, thought the state struct is empty apart from the simple `calculate_range` method specific to `ROWS` mode. For the `RANGE` mode, a stateful calculation can be utilized in the future. For now, the state struct is empty and the specific `calculate_range` implementation is moved to the state struct. For the `GROUPS` mode, a stateful implementation, that keeps track of the moving window range of groups for each consecutive row, is provided. The `frame exclusion` is still not supported. # Observations The implementation for the `RANGE` mode can also utilize a stateful implementation, instead of calculating the window range for each row from scratch. # Future work - Stateful `RANGE` mode implementation - A method to find the next group index, utilizing an exponentially growing step size, is implemented in this PR (`find_next_group_and_start_index`). This method can be improved to choose an approach depending on statistics about previous group sizes. It can either search the next group by advancing one-by-one (for small group sizes) or utilizing the exponentially growing step size, or even setting a base step size when exponentially growing. We can also create a benchmark implementation to get insights about the crossover point. # Are these changes tested? New unit tests relevant to the added functionality are added in `window_frame_state.rs`. The tests in `windows.rs` is extended to cover the `GROUPS` mode, and a test file is added to the integration test SQLs. # Are there any user-facing changes? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
