vamshikrishnakyatham opened a new issue, #14016:
URL: https://github.com/apache/hudi/issues/14016
### Bug Description
**What happened:**
CDC query returns different number of rows when ORDER BY ts_ms is added. For
e.g., without ORDER BY: 9 rows returned. With ORDER BY: 6 rows returned. Some
records are missing in the ordered result.
**What you expected:**
CDC query should return the same number of rows regardless of whether ORDER
BY clause is used. The ordering should not affect which records are included in
the result set.
**Steps to reproduce:**
1. Create a Hudi table with CDC enabled and perform insert/update/delete
operations
2. Run CDC query without ORDER BY: SELECT op, ts_ms, ... FROM
hudi_table_changes('table_path', 'cdc', 'earliest')
3. Run the same CDC query with ORDER BY: SELECT op, ts_ms, ... FROM
hudi_table_changes('table_path', 'cdc', 'earliest') ORDER BY ts_ms ASC
4. Observe different row counts and results between the two queries
Run results:
```
spark-sql (default)> SELECT op, ts_ms, get_json_object(before, '$.ts') AS
before_ts, get_json_object(before, '$.rider') as before_rider,
get_json_object(after, '$.rider') AS after_rider FROM
hudi_table_changes('file:///tmp/hudi_test_table', 'cdc', 'earliest')
> ;
i 20250924110448254 NULL NULL rider-E
i 20250924110628340 NULL NULL rider-G
u 20250924110905831 1695516137 rider-G rider-E
i 20250924110448254 NULL NULL rider-C
u 20250924110905831 1695091554 rider-C rider-E
i 20250924110448254 NULL NULL rider-A
i 20250924110448254 NULL NULL rider-F
u 20250924110551107 1695516137 rider-F rider-E
d 20250924110611165 1695516137 rider-E NULL
Time taken: 0.141 seconds, Fetched 9 row(s)
spark-sql (default)> SELECT op, ts_ms, get_json_object(before, '$.ts') AS
before_ts, get_json_object(before, '$.rider') as before_rider, get_json_obj
ect(after, '$.rider') AS after_rider FROM
hudi_table_changes('file:///tmp/hudi_test_table', 'cdc', 'earliest') order by
ts_ms asc;
i 20250924110448254 NULL NULL rider-A
u 20250924110551107 1695516137 rider-F rider-E
d 20250924110611165 1695516137 rider-E NULL
i 20250924110628340 NULL NULL rider-G
u 20250924110905831 1695516137 rider-G rider-E
u 20250924110905831 1695091554 rider-C rider-E
Time taken: 0.214 seconds, Fetched 6 row(s)
```
### Environment
**Hudi version:** 1.1
**Query engine:** (Spark/Flink/Trino etc)
**Relevant configs:**
### Logs and Stack Trace
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]