yashmayya opened a new pull request, #14249: URL: https://github.com/apache/pinot/pull/14249
- Fixes https://github.com/apache/pinot/issues/11406. - Background reading - https://www.postgresql.org/docs/current/tutorial-window.html, https://www.postgresql.org/docs/current/functions-window.html, https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS (this one is most relevant to this PR). - Currently, Pinot's window function implementations have limited or even incorrect support for window frame bounds. For instance, `FIRST_VALUE` / `LAST_VALUE` assume that the window frame is always `ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING` even though the default window frame as per standard SQL is `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`. Furthermore, support for defining the lower bound explicitly as `UNBOUNDED PRECEDING` / `CURRENT ROW` / `n FOLLOWING` / `n PRECEDING` and the upper bound as `UNBOUNDED FOLLOWING` / `CURRENT ROW` / `n FOLLOWING` / `n PRECEDING` does not exist. - This patch adds support for any custom bounds (offset based or otherwise) for `ROWS` window frames, and also adds support for `UNBOUNDED PRECEDING` / `CURRENT ROW` / `UNBOUNDED FOLLOWING` bounds for `RANGE` window frames. There are a ton of edge cases to be handled here but this patch attempts to add test cases to cover most of these scenarios. - Note that Calcite (and hence Pinot) only supports `ROWS` and `RANGE` based window frame bounds, whereas Postgres also supports `GROUPS`. - The planner side changes (mainly literal extraction) are built over https://github.com/apache/pinot/pull/14233/. - Apart from the need to add support for offset bounds for `RANGE` based window frames, another important future enhancement is to optimize the performance of `ROWS` based window frames for aggregate window functions where both the lower and upper bounds are offset based / current row. Since the changes in this patch are built over the existing framework for window functions where a "merger" is used to merge values for aggregate window functions, it isn't possible to use a sliding window based algorithm to efficiently compute aggregates for windows. This will require more significant changes to the framework but is critical to ensure performant computations especially for larger windows. Optimizations have been added in this patch to ensure that aggregation window functions over window frames with `UNBOUNDED PRECEDING` lower bound or `UNBOUNDED FOLLOWING` upper bound are computed efficiently. - Note that all the changes here only affect the aggregate window functions (`SUM`, `COUNT`, `MIN`, `MAX` etc.) and `FIRST_VALUE` / `LAST_VALUE`. The other window functions currently supported by Pinot (`LAG`, `LEAD`, `RANK`, `DENSE_RANK`, `ROW_NUMBER`) don't support custom window frame bounds and Calcite ensures that during query planning. - Calcite also does some other validation for window frame bounds like making sure lower bound isn't `UNBOUNDED FOLLOWING` / upper bound isn't `UNBOUNDED PRECEDING`, lower bound isn't `UNBOUNDED FOLLOWING` if upper bound is `UNBOUNDED PRECEDING` and vice versa etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
