steveahnahn opened a new pull request, #68869:
URL: https://github.com/apache/airflow/pull/68869
Cursor (keyset) paginated REST list endpoints (`GET /dags/{dag_id}/dagRuns`
and `GET .../taskInstances`) silently dropped rows when sorted by a nullable
column such as `start_date`, `end_date`, `duration`, or `state`. The keyset
predicate and the generated `ORDER BY` disagreed on where NULLs sort, so once a
page boundary fell on the NULL/non-NULL edge, every row on one side of it was
skipped, with no error.
This is reachable from the shipping web UI: the Task Instances and Dag Runs
lists paginate by cursor and let you sort by clicking a column header, so
sorting by **Start Date** while some queued (not yet started) task instances
are present makes rows silently disappear from the grid.
### Fix
`NULLS FIRST/LAST` is not portable (unsupported on MySQL and older SQLite),
so the cursor path pins NULL placement with a portable `CASE`-based null-rank
key shared by both the keyset `ORDER BY` and the keyset predicate, so they can
no longer disagree on any backend. The rank follows the column's sort
direction, so NULLs sort as the largest value (last when ascending, first when
descending), matching PostgreSQL's default. PostgreSQL result ordering is
therefore unchanged; SQLite and MySQL NULL ordering shifts to align with
PostgreSQL.
- Cursor token format is unchanged (the rank is derived from the decoded
value, not encoded), so existing cursors keep working.
- Offset pagination is untouched.
The `CASE` in `ORDER BY` means the sort on a nullable column cannot use a
plain column index. This is the cost of cross-backend portability; a
PostgreSQL-only `NULLS LAST` fast path could be a future optimization.
### Tests
- Fail-first endpoint regressions (`taskInstances` and `dagRuns`):
paginating by a nullable column returns every row only after the fix.
- Forward/backward cursor consistency over a nullable column.
- Unit coverage for the keyset expansion (single and multiple nullable
columns, no-NULLs case, rank derivation, nullability detection).
closes: #68858
---
##### Was generative AI tooling used to co-author this PR?
- [X] Yes (Claude Code, Opus 4.8)
Generated-by: Claude Code (Opus 4.8) following [the
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]