mingmwang commented on PR #5171:
URL: 
https://github.com/apache/arrow-datafusion/pull/5171#issuecomment-1420625925

   @mustafasrepo 
   
   I found there was bug in the `EnforceSorting` rule, it was not introduced by 
this PR, the rule might change the final output ordering of the plan and will 
affect the correctness of the query result.
   
   Please take a look at the below test case and SQL
   
   `test_window_agg_sort_reversed_plan`
   
   SQL
   ```
   SELECT
       c9,
       SUM(c9) OVER(ORDER BY c9 ASC ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING) 
as sum1,
       SUM(c9) OVER(ORDER BY c9 DESC ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING) 
as sum2
       FROM aggregate_test_100
       LIMIT 5
   ```
   
   The optimized logical plan:
   
   ```
   Projection: aggregate_test_100.c9, SUM(aggregate_test_100.c9) ORDER BY 
[aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING 
AS sum1, SUM(aggregate_test_100.c9) ORDER BY [aggregate_test_100.c9 DESC NULLS 
FIRST] ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING AS sum2
     Limit: skip=0, fetch=5
       WindowAggr: windowExpr=[[SUM(aggregate_test_100.c9) ORDER BY 
[aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 5 
FOLLOWING]]
         WindowAggr: windowExpr=[[SUM(aggregate_test_100.c9) ORDER BY 
[aggregate_test_100.c9 DESC NULLS FIRST] ROWS BETWEEN 1 PRECEDING AND 5 
FOLLOWING]]
           TableScan: aggregate_test_100 projection=[c9]
   ```
   
   The optimized physical plan:
   ```
   ProjectionExec: expr=[c9@0 as c9, SUM(aggregate_test_100.c9) ORDER BY 
[aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 5 
FOLLOWING@2 as sum1, SUM(aggregate_test_100.c9) ORDER BY [aggregate_test_100.c9 
DESC NULLS FIRST] ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING@1 as sum2],
     RepartitionExec: partitioning=RoundRobinBatch(2), input_partitions=1,
       GlobalLimitExec: skip=0, fetch=5,
         BoundedWindowAggExec: wdw=[SUM(aggregate_test_100.c9): Ok(Field { 
name: \"SUM(aggregate_test_100.c9)\", data_type: UInt64, nullable: true, 
dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { 
units: Rows, start_bound: Preceding(UInt64(5)), end_bound: Following(UInt64(1)) 
}],
           BoundedWindowAggExec: wdw=[SUM(aggregate_test_100.c9): Ok(Field { 
name: \"SUM(aggregate_test_100.c9)\", data_type: UInt64, nullable: true, 
dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { 
units: Rows, start_bound: Preceding(UInt64(1)), end_bound: Following(UInt64(5)) 
}],
             SortExec: [c9@0 DESC], global=true
   ```
   
   Based on the logical plan, the final plan's output ordering should be `ORDER 
BY C9 ASC` and the `LIMIT 5` should gives the least 5 items. But after the 
physical optimization, the final plan's output ordering becomes  `ORDER BY C9 
DESC`  and the `LIMIT 5` gives the top 5 items.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to