YoannAbriel opened a new pull request, #63581:
URL: https://github.com/apache/airflow/pull/63581

   ## Problem
   
   When multiple workers try to write rendered task instance fields (RTIF) for 
the same task instance simultaneously, the API server returns a `409 Conflict` 
error due to a unique constraint violation on 
`rendered_task_instance_fields_pkey`. This causes the task runner to fail with 
`AirflowRuntimeError`, marking the task as failed even though it completed 
successfully.
   
   This is particularly common with CeleryExecutor when parallel tasks render 
fields at the same time, or when task retries overlap.
   
   Closes: #61705
   
   ## Root Cause
   
   The `update_rtif` method uses `session.merge()` which performs a 
SELECT-then-INSERT/UPDATE pattern. When two concurrent requests both SELECT and 
find no existing record, they both attempt an INSERT, and the second one fails 
with an IntegrityError.
   
   The global `_UniqueConstraintErrorHandler` catches this IntegrityError and 
converts it to a `409 Conflict` HTTP response, which the task-sdk treats as a 
fatal error.
   
   ## Fix
   
   Handle `IntegrityError` in the `ti_put_rtif` endpoint with a retry strategy:
   
   1. Catch `IntegrityError` from the first `update_rtif` call
   2. Rollback the failed transaction
   3. Re-fetch the task instance (since the previous ORM object is detached 
after rollback)
   4. Retry `update_rtif` — this time `session.merge()` will find the existing 
record and perform an UPDATE instead of INSERT
   
   This is safe because RTIF writes are idempotent — the last writer wins, 
which is the correct semantic for rendered template fields.
   
   ## Testing
   
   - Added `test_ti_put_rtif_concurrent_write`: verifies that two sequential 
writes to the same RTIF succeed (the second updates rather than conflicts)
   - Added `test_ti_put_rtif_integrity_error_handled`: simulates the race 
condition by mocking `update_rtif` to raise `IntegrityError` on the first call, 
verifying the retry succeeds
   
   Verified with unit tests. No external service dependencies.
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes — Claude Code (Opus 4, claude-opus-4-6)
   
   Generated-by: Claude Code (Opus 4, claude-opus-4-6) following [the 
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)
   
   ---
   
   * Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)**
 for more information.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to