mingmwang commented on PR #5171:
URL:
https://github.com/apache/arrow-datafusion/pull/5171#issuecomment-1420625925
@mustafasrepo
I found there was bug in the `EnforceSorting` rule, it was not introduced by
this PR, the rule might change the final output ordering of the plan and will
affect the correctness of the query result.
Please take a look at the below test case and SQL
`test_window_agg_sort_reversed_plan`
SQL
```
SELECT
c9,
SUM(c9) OVER(ORDER BY c9 ASC ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING)
as sum1,
SUM(c9) OVER(ORDER BY c9 DESC ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING)
as sum2
FROM aggregate_test_100
LIMIT 5
```
The optimized logical plan:
```
Projection: aggregate_test_100.c9, SUM(aggregate_test_100.c9) ORDER BY
[aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING
AS sum1, SUM(aggregate_test_100.c9) ORDER BY [aggregate_test_100.c9 DESC NULLS
FIRST] ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING AS sum2
Limit: skip=0, fetch=5
WindowAggr: windowExpr=[[SUM(aggregate_test_100.c9) ORDER BY
[aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 5
FOLLOWING]]
WindowAggr: windowExpr=[[SUM(aggregate_test_100.c9) ORDER BY
[aggregate_test_100.c9 DESC NULLS FIRST] ROWS BETWEEN 1 PRECEDING AND 5
FOLLOWING]]
TableScan: aggregate_test_100 projection=[c9]
```
The optimized physical plan:
```
ProjectionExec: expr=[c9@0 as c9, SUM(aggregate_test_100.c9) ORDER BY
[aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 5
FOLLOWING@2 as sum1, SUM(aggregate_test_100.c9) ORDER BY [aggregate_test_100.c9
DESC NULLS FIRST] ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING@1 as sum2],
RepartitionExec: partitioning=RoundRobinBatch(2), input_partitions=1,
GlobalLimitExec: skip=0, fetch=5,
BoundedWindowAggExec: wdw=[SUM(aggregate_test_100.c9): Ok(Field {
name: \"SUM(aggregate_test_100.c9)\", data_type: UInt64, nullable: true,
dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame {
units: Rows, start_bound: Preceding(UInt64(5)), end_bound: Following(UInt64(1))
}],
BoundedWindowAggExec: wdw=[SUM(aggregate_test_100.c9): Ok(Field {
name: \"SUM(aggregate_test_100.c9)\", data_type: UInt64, nullable: true,
dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame {
units: Rows, start_bound: Preceding(UInt64(1)), end_bound: Following(UInt64(5))
}],
SortExec: [c9@0 DESC], global=true
```
Based on the logical plan, the final plan's output ordering should be `ORDER
BY C9 ASC` and the `LIMIT 5` should gives the least 5 items. But after the
physical optimization, the final plan's output ordering becomes `ORDER BY C9
DESC` and the `LIMIT 5` gives the top 5 items.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]