alamb commented on issue #20244: URL: https://github.com/apache/datafusion/issues/20244#issuecomment-3879087236
I used codex to track down that this issue started appearing with this issue - eb30c19b30 / https://github.com/apache/datafusion/pull/19287 Specifically, at - 6fa9c1ad11: EXPLAIN has SortExec above AggregateExec, output is correctly ordered. ```sql +---+-----------------------------+-----------------------+ | x | agg_src_sorted.y % Int64(2) | sum(agg_src_sorted.v) | +---+-----------------------------+-----------------------+ | 1 | 0 | 20 | | 1 | 1 | 40 | | 2 | 0 | 50 | | 2 | 1 | 100 | +---+-----------------------------+-----------------------+ explain SELECT x, CAST(y AS BIGINT) % 2, SUM(v) FROM agg_src_sorted GROUP BY x, CAST(y AS BIGINT) % 2 ORDER BY x, CAST(y AS BIGINT) % 2; +---------------+-------------------------------+ | plan_type | plan | +---------------+-------------------------------+ | physical_plan | ┌───────────────────────────┐ | | | │ SortExec │ | | | │ -------------------- │ | | | │ x@0 ASC NULLS LAST, │ | | | │ agg_src_sorted.y │ | | | │ % Int64(2)@1 ASC NULLS │ | | | │ LAST │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ AggregateExec │ | | | │ -------------------- │ | | | │ aggr: │ | | | │ sum(agg_src_sorted.v) │ | | | │ │ | | | │ group_by: │ | | | │ x, CAST(y AS Int64) % 2 as│ | | | │ agg_src_sorted.y % Int64 │ | | | │ (2) │ | | | │ │ | | | │ mode: Single │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ DataSourceExec │ | | | │ -------------------- │ | | | │ files: 1 │ | | | │ format: parquet │ | | | └───────────────────────────┘ | | | | +---------------+-------------------------------+ 1 row(s) fetched. Elapsed 0.026 seconds. ``` Then, at the next commit here - eb30c19b30: SortExec is gone, output is misordered: ```sql SELECT x, CAST(y AS BIGINT) % 2, SUM(v) FROM agg_src_sorted GROUP BY x, CAST(y AS BIGINT) % 2 ORDER BY x, CAST(y AS BIGINT) % 2; +---+-----------------------------+-----------------------+ | x | agg_src_sorted.y % Int64(2) | sum(agg_src_sorted.v) | +---+-----------------------------+-----------------------+ | 1 | 1 | 40 | | 1 | 0 | 20 | | 2 | 1 | 100 | | 2 | 0 | 50 | +---+-----------------------------+-----------------------+ 4 row(s) fetched. Elapsed 0.029 seconds. > -- This query orders by an expresson of y that breaks the ordering explain SELECT x, CAST(y AS BIGINT) % 2, SUM(v) FROM agg_src_sorted GROUP BY x, CAST(y AS BIGINT) % 2 ORDER BY x, CAST(y AS BIGINT) % 2; +---------------+-------------------------------+ | plan_type | plan | +---------------+-------------------------------+ | physical_plan | ┌───────────────────────────┐ | | | │ AggregateExec │ | | | │ -------------------- │ | | | │ aggr: │ | | | │ sum(agg_src_sorted.v) │ | | | │ │ | | | │ group_by: │ | | | │ x, CAST(y AS Int64) % 2 as│ | | | │ agg_src_sorted.y % Int64 │ | | | │ (2) │ | | | │ │ | | | │ mode: Single │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ DataSourceExec │ | | | │ -------------------- │ | | | │ files: 1 │ | | | │ format: parquet │ | | | └───────────────────────────┘ | | | | +---------------+-------------------------------+ 1 row(s) fetched. Elapsed 0.016 seconds. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
