viirya commented on code in PR #8410:
URL: https://github.com/apache/arrow-datafusion/pull/8410#discussion_r1413396097
##########
datafusion/expr/src/window_frame.rs:
##########
@@ -148,12 +148,22 @@ impl WindowFrame {
pub fn regularize(mut frame: WindowFrame, order_bys: usize) ->
Result<WindowFrame> {
if frame.units == WindowFrameUnits::Range && order_bys != 1 {
// Normally, RANGE frames require an ORDER BY clause with exactly one
- // column. However, an ORDER BY clause may be absent in two edge cases.
+ // column. However, an ORDER BY clause may be absent or present but
with
+ // more than one column in two edge cases:
+ // 1. start bound is UNBOUNDED or CURRENT ROW
+ // 2. end bound is CURRENT ROW or UNBOUNDED.
+ // In these cases, we regularize the RANGE frame to be equivalent to a
ROWS
+ // frame with the UNBOUNDED bounds.
+ // Note that this follows Postgres behavior.
if (frame.start_bound.is_unbounded()
|| frame.start_bound == WindowFrameBound::CurrentRow)
&& (frame.end_bound == WindowFrameBound::CurrentRow
|| frame.end_bound.is_unbounded())
{
+ // If an ORDER BY clause is absent, the frame is equivalent to a
ROWS
+ // frame with the UNBOUNDED bounds.
+ // If an ORDER BY clause is present but has more than one column,
the
+ // frame is unchanged.
Review Comment:
Actually, based on Postgres behavior, it doesn't change the RANGE frame to
ROWS frame (see the added tests in sqllogictest which return different results
than Postgres). Based on Postgres doc, without `ORDER BY`, Postgres treats all
rows of the partition as peers of the current row
(https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS).
##########
datafusion/expr/src/window_frame.rs:
##########
@@ -148,12 +148,22 @@ impl WindowFrame {
pub fn regularize(mut frame: WindowFrame, order_bys: usize) ->
Result<WindowFrame> {
if frame.units == WindowFrameUnits::Range && order_bys != 1 {
// Normally, RANGE frames require an ORDER BY clause with exactly one
- // column. However, an ORDER BY clause may be absent in two edge cases.
+ // column. However, an ORDER BY clause may be absent or present but
with
+ // more than one column in two edge cases:
+ // 1. start bound is UNBOUNDED or CURRENT ROW
+ // 2. end bound is CURRENT ROW or UNBOUNDED.
+ // In these cases, we regularize the RANGE frame to be equivalent to a
ROWS
+ // frame with the UNBOUNDED bounds.
+ // Note that this follows Postgres behavior.
if (frame.start_bound.is_unbounded()
|| frame.start_bound == WindowFrameBound::CurrentRow)
&& (frame.end_bound == WindowFrameBound::CurrentRow
|| frame.end_bound.is_unbounded())
{
+ // If an ORDER BY clause is absent, the frame is equivalent to a
ROWS
+ // frame with the UNBOUNDED bounds.
+ // If an ORDER BY clause is present but has more than one column,
the
+ // frame is unchanged.
Review Comment:
Actually, based on Postgres behavior, it doesn't change the RANGE frame to
ROWS frame (see the added tests in sqllogictest which return different results
than Postgres).
Based on Postgres doc, without `ORDER BY`, Postgres treats all rows of the
partition as peers of the current row
(https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]