I think this is a great direction, but I'd like to see it address Workloads in general, not just Tasks. The executor workload abstraction already exists; `ExecuteTask`, `ExecuteCallback`, and `RunTrigger` are all concrete workload types with shared base classes for key, state, display_name, and bundle routing. Once PR #63491 [1] is merged, a good chunk of the workflow will be unified between Task Instances and Executor Callbacks, and Sebastian is working on moving the Dag Processor callbacks over to the executor as well, [2] which will further extend the existing executor workload abstraction. I think the new state system can be implemented in a way that all of those can take advantage of it.
The `BaseDagBundleWorkload` base class already defines abstract `success_state`/`failure_state` properties and a `WorkloadState` type alias. LocalExecutor and Celery Executor both already support them with every other executor already being adapted [3]. The main gap is that `BaseExecutor.change_state()` and a few other helpers are still hardcoded to `TaskInstanceKey`/`TaskInstanceState` which seems like exactly the kind of thing this AIP can address generically. If you think that's too big of a scope for this AIP, I'd like you to at least consider that it may be future work and attempt to design it in a way that it can later be extended to a Workload State Management. [1] https://github.com/apache/airflow/pull/63491 [2] https://lists.apache.org/thread/o0z8v01v9qq26r6qmvx8zwbkmho1fnbg [3] https://github.com/apache/airflow/issues/62887<https://github.com/apache/airflow/issues/62887> - ferruzzi ________________________________ From: Vincent Beck <[email protected]> Sent: Monday, March 23, 2026 7:32 AM To: [email protected] <[email protected]> Subject: RE: [EXT] [DISCUSS] Task State Management CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le contenu ne présente aucun risque. +1. I support 100% this new AIP. This is very much needed for event driven scheduling but also for the other scenarios well described in this AIP. On 2026/03/23 08:19:35 Amogh Desai wrote: > Thanks Vikram, XD, and Jake for the proposal. It covers all the angles I > can think at > the time being and I appreciate merging together the various patterns. > > The AIP really covers good breadth and depth too. +1 from me on this, and I > hope we > can see this one in action. Happy to help with any efforts here. > > Thanks & Regards, > Amogh Desai > > > On Mon, Mar 23, 2026 at 4:45 AM Xiaodong Deng <[email protected]> wrote: > > > Thanks, Vikram, for helping bring all the efforts together. > > And thanks, everyone, for your positive feedback. > > > > I have been discussing this proposal with Vikram offline, and I'm quite > > confident it is going to resolve quite a few pending issues and > > inconveniences in how people use Airflow, or at least help people avoid > > unnecessary hacks. > > > > As far as I could see, it will be able to cover all the use cases I > > brought up in my earlier draft AIP, "Add 'persist_xcom_through_retry' > > Parameter to Airflow Operators" ( > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399278333). > > That's also why I look forward even more to seeing this Task State > > Management feature in a very near future version of Airflow. > > > > Would love to receive more feedback from the community. Vikram, Jake, and > > I are looking forward to working with everyone to bring this thrilling > > feature to life. > > > > Regards, > > XD > > > > On 2026/03/22 06:48:10 Rahul Vats wrote: > > > Thanks, Vikram for bringing this up. > > > > > > A big +1 from me as well. The three patterns you mentioned are very > > real, I > > > have seen users stretch XCom in all sorts of ways to fill exactly these > > > gaps. > > > The clean separation from XCom with different scoping and lifecycle > > makes a > > > lot of sense. Will go through the AIP doc in detail. > > > > > > Thanks, > > > Rahul > > > > > > > > > On Sun, 22 Mar 2026 at 01:39, Jens Scheffler <[email protected]> > > wrote: > > > > > > > Thanks Vikram, Jake, XD also from my side! > > > > > > > > A big +1 for moving this forward and I think this is really important. > > > > Though from reading over it I do not see why it is marked as DRAFT, > > > > because besides nt I think it is already very mature. All what I saw is > > > > in general "right". So I hope this is a not really controversional > > > > discuss and then we can get this in 3.3! > > > > > > > > (Some could say this concept is overdue... but is important to have!) > > > > > > > > Jens > > > > > > > > On 21.03.26 20:58, Jarek Potiuk wrote: > > > > > Thanks Vikram, > > > > > > > > > > This is a crucial AIP for Airflow 3.3+. I skimmed through it and will > > > > > provide more comments over the coming days, but it very much looks > > like > > > > > what I imagined for state management in Airflow. > > > > > It has about the right abstraction layer, focusing on building > > > > > infrastructure that serves the previously articulated - use cases and > > > > > likely supports other use cases we are not yet aware of. I really > > like > > > > how > > > > > it maps the "generic" interface into those cases. > > > > > > > > > > I have this old "rule of thumb": you need at least three use cases > > to be > > > > > able to design a truly reusable infrastructure API/component. .. > > Here we > > > > > have 3 use cases it will serve :) > > > > > > > > > > Jl > > > > > > > > > > > > > > > On Sat, Mar 21, 2026 at 8:44 PM Vikram Koka via dev < > > > > [email protected]> > > > > > wrote: > > > > > > > > > >> Dear Airflowers, > > > > >> > > > > >> Over the last several months, there have been a lot of discussions > > in > > > > the > > > > >> devlist around improvements needed for long running jobs outside of > > > > Airflow > > > > >> (raised by XD and others), and about improved event triggering > > (raised > > > > by > > > > >> Jake and others). XD, Jake, and I have gotten together and > > collaborated > > > > on > > > > >> a unified approach for Task State Management within Airflow which we > > > > would > > > > >> like to propose. > > > > >> > > > > >> Apache Airflow has been built around stateless, idempotent tasks, > > and > > > > this > > > > >> has served the community incredibly well. But as production AI and > > data > > > > >> workloads have grown more sophisticated, a clear gap has emerged > > that > > > > the > > > > >> community has been working around for a while. > > > > >> > > > > >> Three patterns keep coming up. An incremental operator needs to know > > > > where > > > > >> it left off last time, so it does not reprocess data it has already > > > > >> handled. An operator running a Databricks or EMR job needs to > > survive a > > > > >> worker disruption without cancelling a job that was 90% complete and > > > > >> starting over from scratch. A long-running async task processing > > > > thousands > > > > >> of files needs to checkpoint its progress so a retry picks up where > > it > > > > left > > > > >> off, not from the beginning. > > > > >> > > > > >> All three patterns are forcing users into the same workarounds today > > > > >> generally bending XCom beyond its intended purpose, or building > > their > > > > own > > > > >> state persistence outside of Airflow entirely. > > > > >> > > > > >> We think we can do better. AIP-XX: Task State Management is a new > > > > >> foundation AIP that addresses all three patterns through a single, > > > > minimal, > > > > >> pluggable framework. Built on top of the Execution API from AIP-72, > > with > > > > >> full async support consistent with AIP-98, Task State is > > deliberately > > > > and > > > > >> cleanly separate from XCom, with different scoping, different > > lifecycle > > > > >> semantics, and different garbage collection mechanics. It also > > provides > > > > the > > > > >> foundation for a simplified AIP-93 (Asset Watermarking) > > > > >> < > > > > >> > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/%5BWIP%5D+AIP-93+Asset+Watermarks+and+State+Variables > > > > >> and for long running remote operations using either the AIP-tbd > > > > Persistent > > > > >> Parameter for Airflow Operators > > > > >> < > > > > >> > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399278333 > > > > >> or AIP-96 (Resumable Operators) > > > > >> < > > > > >> > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-96+Resumable+Operators > > > > >> . > > > > >> > > > > >> Full draft is on Confluence as Draft AIP-xx: Task State Management > > > > >> < > > > > >> > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/Draft%3A+AIP-xx%3A+Task+State+Management > > > > >> We would love to hear your thoughts. Please comment on the AIP doc. > > > > >> > > > > >> Best regards, > > > > >> Vikram, XD, and Jake > > > > >> -- > > > > >> > > > > >> Vikram Koka > > > > >> Chief Strategy Officer > > > > >> Email: [email protected] > > > > >> > > > > >> > > > > >> <https://www.astronomer.io/> > > > > >> > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [email protected] > > > > For additional commands, e-mail: [email protected] > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
