alamb opened a new issue, #20244:
URL: https://github.com/apache/datafusion/issues/20244
### Describe the bug
While debugging the DataFusion 52 upgrade, I found a wrong results bug with
pre-sorted data that was introduced in 52
### To Reproduce
```sql
CREATE TABLE agg_src(x INT, y INT, v INT) AS VALUES
(1, 1, 10),
(1, 2, 20),
(1, 3, 30),
(2, 1, 40),
(2, 2, 50),
(2, 3, 60);
-- Create an ordered table:
COPY (SELECT * FROM agg_src ORDER BY x, y) TO 'foo.parquet';
```
Then run
```sql
CREATE EXTERNAL TABLE agg_src_sorted(x INT, y INT, v INT) STORED AS PARQUET
LOCATION 'foo.parquet' WITH ORDER (x ASC, y ASC);
-- This query orders by an expresson of y that breaks the ordering
SELECT
x,
CAST(y AS BIGINT) % 2,
SUM(v)
FROM agg_src_sorted
GROUP BY x, CAST(y AS BIGINT) % 2
ORDER BY x, CAST(y AS BIGINT) % 2;
```
With Datafusion 52, you get the wrong answer:
```shell
andrewlamb@Andrews-MacBook-Pro-3 ~ %
~/Software/datafusion-cli/datafusion-cli-52.1.0
```
```sql
> SELECT
x,
CAST(y AS BIGINT) % 2,
SUM(v)
FROM agg_src_sorted
GROUP BY x, CAST(y AS BIGINT) % 2
ORDER BY x, CAST(y AS BIGINT) % 2;
+---+-----------------------------+-----------------------+
| x | agg_src_sorted.y % Int64(2) | sum(agg_src_sorted.v) |
+---+-----------------------------+-----------------------+
| 1 | 1 | 40 |
| 1 | 0 | 20 | <---- the second
column is 1 then 0, rather than 0 then 1
| 2 | 1 | 100 |
| 2 | 0 | 50 |
+---+-----------------------------+-----------------------+
4 row(s) fetched.
Elapsed 0.006 seconds.
```
On datafusion 51
```shell
andrewlamb@Andrews-MacBook-Pro-3 ~ %
~/Software/datafusion-cli/datafusion-cli-51.0.0
```
You get the expected answer
```shell
> SELECT
x,
CAST(y AS BIGINT) % 2,
SUM(v)
FROM agg_src_sorted
GROUP BY x, CAST(y AS BIGINT) % 2
ORDER BY x, CAST(y AS BIGINT) % 2;
+---+-----------------------------+-----------------------+
| x | agg_src_sorted.y % Int64(2) | sum(agg_src_sorted.v) |
+---+-----------------------------+-----------------------+
| 1 | 0 | 20 | <---- this row
is in the correct sopt
| 1 | 1 | 40 |
| 2 | 0 | 50 |
| 2 | 1 | 100 |
+---+-----------------------------+-----------------------+
4 row(s) fetched.
Elapsed 0.002 seconds.
### Expected behavior
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]