alamb opened a new issue, #20244:
URL: https://github.com/apache/datafusion/issues/20244

   ### Describe the bug
   
   While debugging the DataFusion 52 upgrade, I found a wrong results bug with 
pre-sorted data that was introduced in 52
   
   ### To Reproduce
   
   ```sql
   CREATE TABLE agg_src(x INT, y INT, v INT) AS VALUES
   (1, 1, 10),
   (1, 2, 20),
   (1, 3, 30),
   (2, 1, 40),
   (2, 2, 50),
   (2, 3, 60);
   
   -- Create an ordered table:
   COPY (SELECT * FROM agg_src ORDER BY x, y) TO 'foo.parquet';
   ```
   
   Then run
   ```sql
   CREATE EXTERNAL TABLE agg_src_sorted(x INT, y INT, v INT) STORED AS PARQUET 
LOCATION 'foo.parquet' WITH ORDER (x ASC, y ASC);
   
   -- This query orders by an expresson of y that breaks the ordering 
   SELECT
     x,
     CAST(y AS BIGINT) % 2,
     SUM(v)
   FROM agg_src_sorted
   GROUP BY x, CAST(y AS BIGINT) % 2
   ORDER BY x, CAST(y AS BIGINT) % 2;
   
   ```
   
   
   With Datafusion 52, you get the wrong answer:
   ```shell
   andrewlamb@Andrews-MacBook-Pro-3 ~ % 
~/Software/datafusion-cli/datafusion-cli-52.1.0
   ```
   
   ```sql
   > SELECT
     x,
     CAST(y AS BIGINT) % 2,
     SUM(v)
   FROM agg_src_sorted
   GROUP BY x, CAST(y AS BIGINT) % 2
   ORDER BY x, CAST(y AS BIGINT) % 2;
   +---+-----------------------------+-----------------------+
   | x | agg_src_sorted.y % Int64(2) | sum(agg_src_sorted.v) |
   +---+-----------------------------+-----------------------+
   | 1 | 1                           | 40                    |
   | 1 | 0                           | 20                    | <---- the second 
column is 1 then 0, rather than 0 then 1
   | 2 | 1                           | 100                   |
   | 2 | 0                           | 50                    |
   +---+-----------------------------+-----------------------+
   4 row(s) fetched.
   Elapsed 0.006 seconds.
   ```
   
   On datafusion 51
   ```shell
   andrewlamb@Andrews-MacBook-Pro-3 ~ % 
~/Software/datafusion-cli/datafusion-cli-51.0.0
   ```
   
   You get the expected answer
   ```shell
   > SELECT
     x,
     CAST(y AS BIGINT) % 2,
     SUM(v)
   FROM agg_src_sorted
   GROUP BY x, CAST(y AS BIGINT) % 2
   ORDER BY x, CAST(y AS BIGINT) % 2;
   +---+-----------------------------+-----------------------+
   | x | agg_src_sorted.y % Int64(2) | sum(agg_src_sorted.v) |
   +---+-----------------------------+-----------------------+
   | 1 | 0                           | 20                    | <---- this row 
is in the correct sopt
   | 1 | 1                           | 40                    |
   | 2 | 0                           | 50                    |
   | 2 | 1                           | 100                   |
   +---+-----------------------------+-----------------------+
   4 row(s) fetched.
   Elapsed 0.002 seconds.
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to