manipatnam opened a new pull request, #67531:
URL: https://github.com/apache/airflow/pull/67531

   ## Problem
   
   Sorting mapped task instances by `rendered_map_index` produced incorrect 
**lexicographic order** instead of numeric order:
   
   ```
   Got:    "0", "1", "10", "100", "101", "11", "2", "3" ...
   Wanted:  0,   1,   2,   3, ..., 10,  11, ..., 100, 101
   ```
   
   Additionally, any request using cursor pagination with 
`order_by=rendered_map_index` returned a **500 error**:
   
   ```
   NotImplementedError: Cursor pagination does not support column-form
   ``to_replace`` mapping for ``rendered_map_index``.
   ```
   
   ## Fix
   
   Map `rendered_map_index` to the integer `map_index` column via a **string 
alias** in `SortParam.to_replace`:
   
   ```python
   to_replace={
       ...
       "rendered_map_index": "map_index",  # routes ORDER BY through integer 
column
   }
   ```
   
   A string alias goes through `getattr(row, "map_index")` in `row_value`, so 
cursor pagination works correctly without any changes to `cursors.py`.
   
   Also adds a `secondary_sort` parameter to `SortParam` as generic 
infrastructure for appending tiebreaker columns to `ORDER BY`.
   
   ## ⚠️ Known side effect — input needed
   
   For tasks with **custom labels** (set via `map_index_template`), sorting by 
`rendered_map_index` now sorts by the underlying **integer creation index** 
rather than alphabetically by label.
   
   Example:
   ```python
   @task(map_index_template="{{ user }}")
   def process(user: str): ...
   
   process.expand(user=["zebra", "apple", "mango", "cherry", "banana"])
   ```
   
   | map_index | label | Sort by `rendered_map_index` (before) | Sort by 
`rendered_map_index` (after) |
   |---|---|---|---|
   | 0 | zebra | apple | zebra |
   | 1 | apple | banana | apple |
   | 2 | mango | cherry | mango |
   | 3 | cherry | mango | cherry |
   | 4 | banana | zebra | banana |
   
   **Before**: alphabetical by label ✓ but broken for numeric + 500 on cursor 
pagination
   **After**: integer creation order — same as `map_index` sort
   
   ## Alternatives considered
   
   The correct solution for both cases simultaneously (numeric → integer order, 
labeled → alphabetical) requires one of:
   
   1. **Zero-padded string expression** — `CASE WHEN _rendered_map_index IS NOT 
NULL THEN _rendered_map_index ELSE printf('%010d', map_index) END` — medium 
complexity, requires DB-specific `printf`/`lpad` + a Python callable in 
`row_value` for cursor encoding
   2. **Multi-column cursor expansion** — expand `rendered_map_index` into 
`(_rendered_map_index, map_index)` in `_resolve()`, with NULL-aware keyset 
predicates in `cursors.py` — high complexity, touches correctness-critical 
pagination code
   3. **Pre-computed sort column** — store a ready-to-sort string in a new DB 
column at TI creation time — cleanest long-term, but requires a migration
   
   This PR takes the minimal safe approach and raises the question: **what 
should `rendered_map_index` sort semantics be for labeled instances?** If 
alphabetical label sort is important, option 1 or 3 above would be the next 
step.
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes — Claude Sonnet 4.6
   
   Generated-by: Claude Sonnet 4.6 following [the 
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to