antoniopetrole opened a new issue, #928:
URL: https://github.com/apache/cloudberry/issues/928

   ### Apache Cloudberry version
   
   Cloudberry 1.6.0 (pre apache release)
   
   ### What happened
   
   One of our users ran into a query with our data that has some really odd 
behavior. Whenever the query is ran, the order by clause returns data 
non-deterministically. For example, the value for col1 should return the value 
9217 for the first 36 rows. However on Cloudberry everytime you run it, it 
seems to return out of order in different ways each time you run it. Sometimes 
it will only be the first 3 rows that have col1 = 9217, the next run it might 
be 1 or 10. I made a test case data set you with the query that you can test 
this against and have included it attached to this issue. I also tested this on 
Postgres and on Postgres it behaves exactly the same and as expected every 
time. I tried testing this on my docker multinode bootcamp cluster to no avail, 
however I was able to recreate it both in our production and dev environments 
using this test dataset. I imagine the # of segments has some influence on this 
in some way.
   
   
[orderby_query.zip](https://github.com/user-attachments/files/18792612/orderby_query.zip)
   
   ### What you think should happen instead
   
   Results should be deterministic and follow the ORDER BY declaration like in 
Postgres
   
   ### How to reproduce
   
   To recreate, just run `psql -f problem_ddl_cloudberry.sql` and then `psql -f 
problem_query.sql` and on each run you'll see different results. I also 
included the Postgres DDL version of this as well. A few notes I've taken that 
might be helpful to know
   
   * All 36 rows that we're looking for appear in the results even when out of 
order
   * Putting the row number as the first column seems to fix it
   * Adding/removing columns from the subquery where the UNION operator is 
seems to randomly fix or keep the bug. You can remove existing columns or add 
columns like gp_segment_id
   * If you persist the results of the subquery to a table beforehand and then 
just include a SELECT * FROM <persisted_table>, this completely fixes the bug 
as well. So I suspect that the issue lies somewhere within the subquery but I 
don't know that for sure.
   
   ### Examples Runs On Cloudberry (focus on how many rows represent col1 = 
9217)
   ```
    col1  | col2  |  col3  | col4 |            col8            |          unid  
         | island_start |  rn
   
-------+-------+--------+------+----------------------------+-------------------------+--------------+-------
     9217 | 32408 | 172087 |    1 | 2024-11-24 13:07:31.508746 | 
1|30|12|259200|0|0      |            1 |     1
    21358 | 52711 | 214706 |    1 | 2024-11-24 16:21:59.720578 | 
1|2443|4|86400|0|0      |            1 |     2
    21358 | 52711 | 214706 |    1 | 2024-11-24 17:20:01.451113 | 
1|2443|4|86400|0|0      |            0 |     3
   
    col1  | col2  |  col3  | col4 |            col8            |          unid  
         | island_start |  rn
   
-------+-------+--------+------+----------------------------+-------------------------+--------------+-------
     9217 | 32408 | 172083 |    1 | 2024-11-24 15:07:28.490719 | 
1|2406|4|86400|0|0      |            1 |     1
     9217 | 32408 | 172087 |    1 | 2024-11-24 13:07:31.508746 | 
1|30|12|259200|0|0      |            1 |     2
    21358 | 52711 | 214707 |    1 | 2024-11-24 17:20:01.525726 | 
1|253|12|259200|0|0     |            1 |     3
   
    col1  | col2  |  col3  | col4 |            col8            |          unid  
         | island_start |  rn
   
-------+-------+--------+------+----------------------------+-------------------------+--------------+-------
     9217 | 32408 | 172083 |    1 | 2024-11-24 15:07:28.490719 | 
1|2406|4|86400|0|0      |            1 |     1
     9217 | 32408 | 172087 |    1 | 2024-11-24 13:07:31.508746 | 
1|30|12|259200|0|0      |            1 |     2
     9217 | 32408 | 172088 |    1 | 2024-11-24 15:07:28.722085 | 
1|30|6|1296000|0|0      |            1 |     3
    21358 | 52711 | 214706 |    1 | 2024-11-24 14:19:23.533454 | 
1|2443|4|86400|0|0      |            1 |     4
   
    col1  | col2  |  col3  | col4 |            col8            |          unid  
         | island_start |  rn
   
-------+-------+--------+------+----------------------------+-------------------------+--------------+-------
    21358 | 52711 | 214706 |    1 | 2024-11-24 14:19:23.533454 | 
1|2443|4|86400|0|0      |            1 |     1
    21358 | 52711 | 214708 |    1 | 2024-11-24 11:36:34.576488 | 
1|294|6|1296000|0|0     |            1 |     2
    21358 | 52711 | 214708 |    1 | 2024-11-24 14:19:23.757234 | 
1|294|6|1296000|0|0     |            0 |     3
   ```
   
   ### Operating System
   
   Rocky Linux 8.10 (Green Obsidian)
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes, I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/cloudberry/blob/main/CODE_OF_CONDUCT.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to