alamb opened a new issue, #4904: URL: https://github.com/apache/arrow-datafusion/issues/4904
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** I am not sure this is an important feature to add, but I wanted to write it down. Basically there is an interesting algorithm called `Segment Tree` which is described in : > Efficient processing of window functions in analytical SQL queries > by Viktor Leis, Kan Kundhikanjana, Alfons Kemper, Thomas Neumann https://dl.acm.org/doi/10.14778/2794367.2794375 [p1058-leis.pdf](https://github.com/apache/arrow-datafusion/files/10417023/p1058-leis.pdf) This algorithm handles window functions with `RANGE` window functions well, at least that is the claim. It might be a reasonable structure to implement instead of the "MovingMinMax" added in https://github.com/apache/arrow-datafusion/pull/4675 by @berkaycpp and @mustafasrepo . **Describe the solution you'd like** If we hit unacceptable performance of window functions (especially with largely varying `RANGE`), this might an algorithm worth looking into. As a reminder a `RANGE` window frame is determined in terms of the values of the partition, not the number of rows: ```sql ❯ create table foo as values (1), (2), (3), (4), (5), (6), (6), (6); 0 rows in set. Query took 0.000 seconds. ❯ select column1, first_value(column1) OVER (ORDER BY column1 RANGE 3 PRECEDING) from foo; +---------+--------------------------+ | column1 | FIRST_VALUE(foo.column1) | +---------+--------------------------+ | 1 | 1 | | 2 | 1 | | 3 | 1 | | 4 | 1 | | 5 | 2 | | 6 | 3 | | 6 | 3 | | 6 | 3 | +---------+--------------------------+ ❯ select column1, first_value(column1) OVER (ROWS 3 PRECEDING) from foo; +---------+--------------------------+ | column1 | FIRST_VALUE(foo.column1) | +---------+--------------------------+ | 1 | 1 | | 2 | 1 | | 3 | 1 | | 4 | 1 | | 5 | 2 | | 6 | 3 | | 6 | 4 | | 6 | 5 | +---------+--------------------------+ 8 rows in set. Query took 0.001 seconds. ``` **Describe alternatives you've considered** A clear and concise description of any alternative solutions or features you've considered. **Additional context** Add any other context or screenshots about the feature request here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
