shaleena commented on issue #65011:
URL: https://github.com/apache/airflow/issues/65011#issuecomment-4252088077
Thanks @amoghrajesh — I checked the deferrable and non deferrable try logs
, and they seem to rule out the trigger-timeout / failed-trigger theory for
this case.
### Deferrable try 1
The trigger path looks normal:
```text
[2026-04-10 06:01:05] INFO - Status of AWS Glue job is: RUNNING
[2026-04-10 06:02:05] INFO - Status of AWS Glue job is: RUNNING
[2026-04-10 06:03:06] INFO - Status of AWS Glue job is: RUNNING
[2026-04-10 06:04:06] INFO - Trigger fired event ...
result=TriggerEvent<{'status': 'success', 'run_id': 'jr_7bf3d105...'}>
[2026-04-10 06:04:06] INFO - trigger completed ...
```
The task then resumes on:
```text
[2026-04-10 06:04:08] INFO - TaskInstance Details ... try_number=1
```
and immediately fails during the standard XCom auto-push path:
```text
[2026-04-10 06:04:08] INFO - Pushing xcom ti=RuntimeTaskInstance(...)
[2026-04-10 06:04:08] ERROR - Task failed with exception
duplicate key value violates unique constraint "xcom_pkey"
DETAIL: Key (dag_run_id, task_id, map_index, key)=(10645, run_job_task, -1,
return_value) already exists.
```
```text
task_runner.py ... _push_xcom_if_needed
task_runner.py ... _xcom_push
xcom.py ... set
comms.py ... send
```
And after failure:
```text
[2026-04-10 06:04:09] WARNING - No XCom value found; defaulting to None.
key=glue_job_run_details ...
```
### Deferrable try 2
The same DAG run then shows the same pattern again on try 2:
```text
[2026-04-10 06:12:16] INFO - Trigger fired event ...
result=TriggerEvent<{'status': 'success', 'run_id': 'jr_e92acd...'}>
[2026-04-10 06:12:16] INFO - trigger completed ...
[2026-04-10 06:12:40] INFO - TaskInstance Details ... try_number=2
[2026-04-10 06:12:41] INFO - Pushing xcom ti=RuntimeTaskInstance(...)
[2026-04-10 06:12:41] ERROR - Task failed with exception
DETAIL: Key (dag_run_id, task_id, map_index, key)=(10645, run_job_task, -1,
return_value) already exists.
```
This suggests:
* the trigger is completing successfully
* the failure already occurs on the resumed leg of **try 1**
* so this does not appear to depend on:
* trigger timeout
* trigger failure
* stale XCom left behind only by an earlier retry
For this run the conflicting `return_value` seems to already exist by the
time `_push_xcom_if_needed` runs on the first resumed attempt.
Also, we see a similar duplicate-`return_value` failure in a
**non-deferrable try** 1 and 2 run, which suggests this may be broader than
the deferrable resume / `next_method` path alone.
Our workaround remains the same:
* `do_xcom_push = False`
* avoid `return_value`
* extract `run_id` from `event["run_id"]`
* push a custom key instead
That avoids the failure consistently in both deferrable and non-deferrable
modes.
Both jobs failed on a scheduled run, no retry or clear was performed
we can share the full logs if needed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]