alamb commented on issue #20244:
URL: https://github.com/apache/datafusion/issues/20244#issuecomment-3879087236

   I used codex to track down that this issue started appearing with this issue
   - eb30c19b30 / https://github.com/apache/datafusion/pull/19287
   
   Specifically, at 
   
   - 6fa9c1ad11: EXPLAIN has SortExec above AggregateExec, output is correctly 
ordered.
   
   ```sql
   +---+-----------------------------+-----------------------+
   | x | agg_src_sorted.y % Int64(2) | sum(agg_src_sorted.v) |
   +---+-----------------------------+-----------------------+
   | 1 | 0                           | 20                    |
   | 1 | 1                           | 40                    |
   | 2 | 0                           | 50                    |
   | 2 | 1                           | 100                   |
   +---+-----------------------------+-----------------------+
   
   explain SELECT
     x,
     CAST(y AS BIGINT) % 2,
     SUM(v)
   FROM agg_src_sorted
   GROUP BY x, CAST(y AS BIGINT) % 2
   ORDER BY x, CAST(y AS BIGINT) % 2;
   +---------------+-------------------------------+
   | plan_type     | plan                          |
   +---------------+-------------------------------+
   | physical_plan | ┌───────────────────────────┐ |
   |               | │          SortExec         │ |
   |               | │    --------------------   │ |
   |               | │    x@0 ASC NULLS LAST,    │ |
   |               | │      agg_src_sorted.y     │ |
   |               | │   % Int64(2)@1 ASC NULLS  │ |
   |               | │            LAST           │ |
   |               | └─────────────┬─────────────┘ |
   |               | ┌─────────────┴─────────────┐ |
   |               | │       AggregateExec       │ |
   |               | │    --------------------   │ |
   |               | │           aggr:           │ |
   |               | │   sum(agg_src_sorted.v)   │ |
   |               | │                           │ |
   |               | │         group_by:         │ |
   |               | │ x, CAST(y AS Int64) % 2 as│ |
   |               | │  agg_src_sorted.y % Int64 │ |
   |               | │            (2)            │ |
   |               | │                           │ |
   |               | │        mode: Single       │ |
   |               | └─────────────┬─────────────┘ |
   |               | ┌─────────────┴─────────────┐ |
   |               | │       DataSourceExec      │ |
   |               | │    --------------------   │ |
   |               | │          files: 1         │ |
   |               | │      format: parquet      │ |
   |               | └───────────────────────────┘ |
   |               |                               |
   +---------------+-------------------------------+
   1 row(s) fetched.
   Elapsed 0.026 seconds.
   ```
   
   Then, at the next commit here
   - eb30c19b30: SortExec is gone, output is misordered:
   ```sql
   SELECT
     x,
     CAST(y AS BIGINT) % 2,
     SUM(v)
   FROM agg_src_sorted
   GROUP BY x, CAST(y AS BIGINT) % 2
   ORDER BY x, CAST(y AS BIGINT) % 2;
   +---+-----------------------------+-----------------------+
   | x | agg_src_sorted.y % Int64(2) | sum(agg_src_sorted.v) |
   +---+-----------------------------+-----------------------+
   | 1 | 1                           | 40                    |
   | 1 | 0                           | 20                    |
   | 2 | 1                           | 100                   |
   | 2 | 0                           | 50                    |
   +---+-----------------------------+-----------------------+
   4 row(s) fetched.
   Elapsed 0.029 seconds.
   
   > -- This query orders by an expresson of y that breaks the ordering
   explain SELECT
     x,
     CAST(y AS BIGINT) % 2,
     SUM(v)
   FROM agg_src_sorted
   GROUP BY x, CAST(y AS BIGINT) % 2
   ORDER BY x, CAST(y AS BIGINT) % 2;
   +---------------+-------------------------------+
   | plan_type     | plan                          |
   +---------------+-------------------------------+
   | physical_plan | ┌───────────────────────────┐ |
   |               | │       AggregateExec       │ |
   |               | │    --------------------   │ |
   |               | │           aggr:           │ |
   |               | │   sum(agg_src_sorted.v)   │ |
   |               | │                           │ |
   |               | │         group_by:         │ |
   |               | │ x, CAST(y AS Int64) % 2 as│ |
   |               | │  agg_src_sorted.y % Int64 │ |
   |               | │            (2)            │ |
   |               | │                           │ |
   |               | │        mode: Single       │ |
   |               | └─────────────┬─────────────┘ |
   |               | ┌─────────────┴─────────────┐ |
   |               | │       DataSourceExec      │ |
   |               | │    --------------------   │ |
   |               | │          files: 1         │ |
   |               | │      format: parquet      │ |
   |               | └───────────────────────────┘ |
   |               |                               |
   +---------------+-------------------------------+
   1 row(s) fetched.
   Elapsed 0.016 seconds.
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to