amoghrajesh commented on issue #65011: URL: https://github.com/apache/airflow/issues/65011#issuecomment-4251037966
Thanks for that input @shaleena. As per the code path, your workaround eliminates this: - `do_xcom_push = False` means that GlueJobRunDetailsLink.persist() returns early, ie: `glue_job_run_details` isnt written - That implies, `_push_xcom_if_needed` exits early as well and `return_value` is not written as xcom too - `_link_GlueJobRunDetailsLink` is still written by `finalize()` every time That last point is important. `_link_GlueJobRunDetailsLink` survives retries and clears without issue because it is always included in `xcom_keys_to_clear` when a task is about to start running. The two xcom keys causing issues are exactly the ones that can be written during a phase that skips cleanup. So, the only explanation I can have here is if something causes the task to resume again (no clearing of `next_method`), the old xcoms will be there because of https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/execution_api/routes/task_instances.py#L257-L267. I would like to ask you to check if trigger timed out / failed? I suspect this now: https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/models/trigger.py#L314 (non None). When task re runs, the API endpoint sees non None due to above, and performs zero cleanup but glue_job_run_details was there from first leg of the run. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
