What is the motivation behind storing internal state in a task, instead of splitting the logic on state boundaries into multiple tasks? That’s what the task abstraction is supposed for, and you wouldn’t need to a separate mechanism for that—regular XCom would just work.
While storing state is a legitimate use case, I feel this particular idea would have a more negative impact on encouraging people to do too many things in one task. I’d even argue the examples given in the Confluence document are already so. TP > On 14 Nov 2025, at 08:32, Xiaodong Deng <[email protected]> wrote: > > Hi folks! > > We would like to propose a new feature in Airflow, a boolean > parameter "persist_xcom_through_retry" Parameter in all Airflow Operators. > Our team added this feature in our internal fork a few years back, and it > has been benefiting our users extensively. > > *I have created an AIP > at https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399278333 > <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399278333>*. > Below is a summary (in the complete AIP, we have a more detailed problem > statement and quite a few interesting use-case examples): > > > > > *Traditionally, XCom is defined as “a mechanism that lets Tasks talk to > each other”. However, XCom also has the capacity and potential to help > persist and manage task state within a task itself.Currently, Apache > Airflow automatically clears a task instance’s XCom data when it is > retried. This behavior, while ensuring clean state for retry attempts, > creates limitations:* > > - *Loss of Internal Progress: Tasks that have internal checkpointing or > progress tracking lose all intermediate state on retry, forcing restart > from the beginning.* > - *Resource State Loss: Tasks cannot maintain state about allocated > resources (compute instances, downstream job IDs, etc.) across retry > attempts, leading to redundant expensive setup operations.* > - *No Recovery/Resume Capability: There's no way for tasks to resume > from internal checkpoints when transient failures occur during > long-running atomicoperations.* > - *Poor User Experience: users must implement external state management > systems to work around this limitation, adding complexity to DAG authoring.* > > > *This proposal aims at extending the capacity of XCom by allowing > persisting a Task Instance’s XCom through its retries, enabling users to > build more resilient and efficient pipelines. This is particularly useful > for the type of tasks which are atomic (so one such task cannot be split > into multiple tasks) and need to manage internal state or checkpoints. * > > > We look forward to your feedback and thoughts. Thanks! > > > Regards, > > XD --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
