XD, I'd be happy to chat with you about some of the work that I've been doing as part of AIP-93. We'd discussed breaking the AIP into two phases; the first is a generic state-store that could be used like Variable and/or XCom (but more like Variable). The second was to leverage that state-store to provide a unified experience for Airflow users/developers interested in building Asset "Watchers". Hit me up in Slack, and we can talk a bit more.
Thanks, Jake On Wed, Nov 19, 2025 at 4:24 PM Xiaodong Deng <[email protected]> wrote: > Hi Jarek, > > If you don't mind, I would suggest not to conclude the proposal as a > "time-to-market" thing vs. "good product". The team put thoughts & efforts > in it hoping for a "good product" too. It's a discussion & learning process > for everyone, and "good"/"bad" is still somehow subjective. > > As a long-time member of this community, I'm more than sure you don't mean > anything negative. But for folks newly engaged or in similar situation, > this may sound a little bit discouraging ;-) > > > Regards, > XD > > On 2025/11/18 22:45:52 Jarek Potiuk wrote: > > Proposed Alternative: > > > > Complete and propose a regular "state" storage proposal - there were > plenty > > of discussions about that - including Asset Watermarks that Ash > mentioned. > > I think the best way is to lead that discussion to completion, and as > > result come up with a state management that can be used in this case as > > well. > > > > As mentioned in my previous - mail - my thinking we are not in > > "time-to-market" game. We are more in "delliver good product". If it > will > > take more time, so be it, but let's do it properly. There is not much to > > loose by having it later, but there is a lot to loose collectively if our > > users will start misusing half-backed feature that will mislead them to > do > > something we do not want them to do. > > > > J. > > > > > > On Tue, Nov 18, 2025 at 11:25 PM Xiaodong Deng <[email protected]> > wrote: > > > > > In addition, I understand we would like to stick to certain > > > design/principles. However, if that is blocking certain reasonable use > > > cases, either alternative solutions need to be provided or "principles" > > > need to be adjusted. > > > > > > That's what I'm hoping for here. > > > > > > Thanks again! > > > > > > > > > Regards, > > > XD > > > > > > On 2025/11/18 22:20:36 Xiaodong Deng wrote: > > > > Thanks for your valuable feedback, folks. > > > > > > > > Hi @TP, > > > > > > > > There are cases where breaking down to multiple tasks is not > feasible or > > > not the best option. For example, the use case 1 I have shared in the > > > Confluence doc appendix. > > > > > > > > There are also examples where splitting into multiple tasks may seem > > > make sense but may cause down-side effect. In use case 2 and 4 in the > > > Confluence doc appendix, I shared why we do it in a single task > instead of > > > splitting them into two tasks. > > > > > > > > Some tasks are simply atomic. > > > > > > > > > > > > Hi @Jarek, > > > > > > > > I'm glad we are talking about idempotency. That's exactly why > sometimes > > > we cannot break down some tasks. In the "Problem Examples" section in > the > > > Confluence doc, I covered that at some extent. > > > > > > > > Would love to discuss more on this, or learn from you for any > > > alternative solutions which can become available to Airflow users in a > > > timely manner. > > > > > > > > Many thanks! > > > > > > > > > > > > Regards, > > > > XD > > > > > > > > On 2025/11/16 09:48:10 Jarek Potiuk wrote: > > > > > I agree with TP wholeheartedly. The basic reason why XCom is > deleted > > > when > > > > > restarting is to maintain idempotency principles. And if we allow > XCom > > > to > > > > > be used to break idempotency (that's basically what state per task > is > > > > > about) - then XCom will stop serving its purpose. > > > > > > > > > > And of course - we are in the new "world" where we are not only > > > supporting > > > > > idempotent tasks, Various optimisations and different kinds of > > > workloads > > > > > require breaking the "old" idempotency rules we used to have when > > > Airflow > > > > > was used mainly for ETL. And deletion of XCom state was also > questioned > > > > > back then because people **wanted** to use Xcom in other ways. But > we > > > held > > > > > strongly and I think that was a good choice. > > > > > > > > > > And while repurposing XCom to do "something" else might seem like a > > > good > > > > > idea - even for Apple, because they could internally agree to some > > > > > convention and use it as "solution". But when you look at Airflow > as a > > > > > product, repurposing XCome to also do something else (i.e. storing > > > state) > > > > > seems a bit "lazy" and "short-cut-y". > > > > > > > > > > What does it save if you do it this way? Few things: > > > > > > > > > > * not having to do database migration to implement new feature > > > > > * avoiding having a clearly defined API where state can be stored > for > > > > > various purposes on different levels (Task Instance, Task, Task > Group > > > > > maybe, Dag, Team eventually) > > > > > * avoiding to think and prepare for all the various use cases that > > > people > > > > > really would like to use it > > > > > * avoiding to write the use-case documentation explaining how you > can > > > use > > > > > state > > > > > * avoiding to write all the test cases making sure that all those > use > > > cases > > > > > are served way > > > > > * not thinking too much about performance and security > implications of > > > > > those ("Xcom has it already sorted out, I am sure it's going to be > > > fine") > > > > > > > > > > Yes, it can be done way faster this way. and I understand some > > > commercial > > > > > users could have chosen this way as a shortcut to handle a > specific use > > > > > case they had in mind. This is absolutely understandable, and this > is > > > what > > > > > I would even expect a for-profit company to do to increase > so-called > > > > > "time-to-market" and start reaping the benefits of it faster. > > > > > > > > > > But should we do it in Airflow the same way ? We are not a > for-profit > > > > > company, time-to-market of such a feature is secondary, compared > to the > > > > > stability, maintainability and having a "product" vision. > > > > > I consider all the above points as absolutely crucial properties > of a > > > > > "product" - which Airflow is. They might not be needed in a > > > "solution", but > > > > > having a good "product" - absolutely requires all those things, > > > > > > > > > > When we switched to Airflow 3, one of the ideas was to remove all > the > > > bad > > > > > "solution-y" decisions we made in the past that slowed us down in > > > general > > > > > and - more importantly - turned us into (as Daniel used to say) > into > > > > > "back-compatibility engineers" > > > > > > > > > > Does it mean it will take longer and require more dedication and > effort > > > > > and discussions to agree on the scope ? Absolutely. Is this a bad > > > thing? I > > > > > don't think so. > > > > > > > > > > J. > > > > > > > > > > > > > > > On Sun, Nov 16, 2025 at 9:43 AM Tzu-ping Chung via dev < > > > > > [email protected]> wrote: > > > > > > > > > > > What is the motivation behind storing internal state in a task, > > > instead of > > > > > > splitting the logic on state boundaries into multiple tasks? > That’s > > > what > > > > > > the task abstraction is supposed for, and you wouldn’t need to a > > > separate > > > > > > mechanism for that—regular XCom would just work. > > > > > > > > > > > > While storing state is a legitimate use case, I feel this > particular > > > idea > > > > > > would have a more negative impact on encouraging people to do too > > > many > > > > > > things in one task. I’d even argue the examples given in the > > > Confluence > > > > > > document are already so. > > > > > > > > > > > > TP > > > > > > > > > > > > > > > > > > > On 14 Nov 2025, at 08:32, Xiaodong Deng <[email protected]> > wrote: > > > > > > > > > > > > > > Hi folks! > > > > > > > > > > > > > > We would like to propose a new feature in Airflow, a boolean > > > > > > > parameter "persist_xcom_through_retry" Parameter in all > Airflow > > > > > > Operators. > > > > > > > Our team added this feature in our internal fork a few years > back, > > > and it > > > > > > > has been benefiting our users extensively. > > > > > > > > > > > > > > *I have created an AIP > > > > > > > at > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399278333 > > > > > > > < > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399278333 > > > > > > >*. > > > > > > > Below is a summary (in the complete AIP, we have a more > detailed > > > problem > > > > > > > statement and quite a few interesting use-case examples): > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *Traditionally, XCom is defined as “a mechanism that lets Tasks > > > talk to > > > > > > > each other”. However, XCom also has the capacity and potential > to > > > help > > > > > > > persist and manage task state within a task itself.Currently, > > > Apache > > > > > > > Airflow automatically clears a task instance’s XCom data when > it is > > > > > > > retried. This behavior, while ensuring clean state for retry > > > attempts, > > > > > > > creates limitations:* > > > > > > > > > > > > > > - *Loss of Internal Progress: Tasks that have internal > > > checkpointing or > > > > > > > progress tracking lose all intermediate state on retry, > forcing > > > restart > > > > > > > from the beginning.* > > > > > > > - *Resource State Loss: Tasks cannot maintain state about > > > allocated > > > > > > > resources (compute instances, downstream job IDs, etc.) > across > > > retry > > > > > > > attempts, leading to redundant expensive setup operations.* > > > > > > > - *No Recovery/Resume Capability: There's no way for tasks to > > > resume > > > > > > > from internal checkpoints when transient failures occur > during > > > > > > > long-running atomicoperations.* > > > > > > > - *Poor User Experience: users must implement external state > > > management > > > > > > > systems to work around this limitation, adding complexity to > DAG > > > > > > authoring.* > > > > > > > > > > > > > > > > > > > > > *This proposal aims at extending the capacity of XCom by > allowing > > > > > > > persisting a Task Instance’s XCom through its retries, enabling > > > users to > > > > > > > build more resilient and efficient pipelines. This is > particularly > > > useful > > > > > > > for the type of tasks which are atomic (so one such task > cannot be > > > split > > > > > > > into multiple tasks) and need to manage internal state or > > > checkpoints. * > > > > > > > > > > > > > > > > > > > > > We look forward to your feedback and thoughts. Thanks! > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > XD > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > To unsubscribe, e-mail: [email protected] > > > > > > For additional commands, e-mail: [email protected] > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [email protected] > > > > For additional commands, e-mail: [email protected] > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [email protected] > > > For additional commands, e-mail: [email protected] > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
