alamb commented on PR #5171:
URL: 
https://github.com/apache/arrow-datafusion/pull/5171#issuecomment-1421451352

   ```sql
    SELECT
       c9,
       SUM(c9) OVER(ORDER BY c9 ASC ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING) 
as sum1,
       SUM(c9) OVER(ORDER BY c9 DESC ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING) 
as sum2
       FROM aggregate_test_100
       LIMIT 5
   ```
   
   I think the output of this query is "implementation defined" according to 
the SQL spec -- it is correct to produce any arbitrary 5 rows out because there 
is no `ORDER BY` on the outer query. If the user wants a specific set of rows 
they ned to add an explicit `ORDER BY`.  
   
   ```sql
    SELECT
       c9,
       SUM(c9) OVER(ORDER BY c9 ASC ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING) 
as sum1,
       SUM(c9) OVER(ORDER BY c9 DESC ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING) 
as sum2
       FROM aggregate_test_100
       ORDER BY c9 ASC
       LIMIT 5
   ```
   
   This is a pretty bad UX choice on the part of SQL in my opinion, and causes 
all sorts of confusion (e.g. when test output changes from run to run!) but 
that is what the SQL spec says. 
   
   > I don't think this is a bug. 
   
   I agree. I think it is acceptable for the optimizer to rewrite the plan so 
the output happens to be different


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to