Thanks for the reply. I look forward to seeing how it all works out. - ferruzzi
________________________________ From: Vikram Koka via dev <[email protected]> Sent: Monday, March 30, 2026 7:28 PM To: [email protected] <[email protected]> Cc: Vikram Koka <[email protected]> Subject: RE: [EXT] [DISCUSS] Task State Management CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le contenu ne présente aucun risque. Hi Dennis, Thank you for this. I want to address the workload generalization point more precisely, because I think there is an important distinction that clarifies where AIP-103 sits. AIP-103 is fundamentally a DAG-author-facing API. The scopes it defines correspond directly to concepts that DAG authors declare and interact with: tasks (via context['task_state']) and assets (via context['asset_state']). Both TaskScope and AssetScope have clear natural primary keys and well-understood DAG-author use cases. However, they are built on a generic underlying mechanism that can support additional scopes when the time is right, so we are not locked in and do not need to decide on everything now. BaseDagBundleWorkload and WorkloadKey are a different layer entirely. They are internal to the scheduler and executor. DAG authors do not declare workloads, do not interact with them, and have no context['workload_state'] surface. When a DAG author writes @task def my_func(), the scheduler creates an ExecuteTask workload behind the scenes, but that is invisible to the author. The workload abstraction is the right place to unify executor mechanics, but it is not the DAG-author API that AIP-103 is designing. SyncCallback and AsyncCallback are DAG-author-visible, but a callback is a one-shot function call with no cross-retry state persistence requirement. I think the right approach is to wait until the callbacks and workloads design stabilizes further. Once it does, we can identify the natural semantic primary key for each workload type, assess the key use cases that need state persistence from a DAG-author perspective, and add the appropriate scope at that point. Since the underlying task_state mechanism is generic, I am confident that we can add additional scopes as needed. Vikram On Mon, Mar 23, 2026 at 5:08 PM Ferruzzi, Dennis <[email protected]> wrote: > I think this is a great direction, but I'd like to see it address > Workloads in general, not just Tasks. The executor workload abstraction > already exists; `ExecuteTask`, `ExecuteCallback`, and `RunTrigger` are all > concrete workload types with shared base classes for key, state, > display_name, and bundle routing. Once PR #63491 [1] is merged, a good > chunk of the workflow will be unified between Task Instances and Executor > Callbacks, and Sebastian is working on moving the Dag Processor callbacks > over to the executor as well, [2] which will further extend the existing > executor workload abstraction. I think the new state system can be > implemented in a way that all of those can take advantage of it. > > The `BaseDagBundleWorkload` base class already defines abstract > `success_state`/`failure_state` properties and a `WorkloadState` type > alias. LocalExecutor and Celery Executor both already support them with > every other executor already being adapted [3]. The main gap is that > `BaseExecutor.change_state()` and a few other helpers are still hardcoded > to `TaskInstanceKey`/`TaskInstanceState` which seems like exactly the kind > of thing this AIP can address generically. > > If you think that's too big of a scope for this AIP, I'd like you to at > least consider that it may be future work and attempt to design it in a way > that it can later be extended to a Workload State Management. > > > [1] https://github.com/apache/airflow/pull/63491 > [2] https://lists.apache.org/thread/o0z8v01v9qq26r6qmvx8zwbkmho1fnbg > [3] https://github.com/apache/airflow/issues/62887 > <https://github.com/apache/airflow/issues/62887> > > - ferruzzi > > ------------------------------ > *From:* Vincent Beck <[email protected]> > *Sent:* Monday, March 23, 2026 7:32 AM > *To:* [email protected] <[email protected]> > *Subject:* RE: [EXT] [DISCUSS] Task State Management > > CAUTION: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > > > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. > Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez > pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que > le contenu ne présente aucun risque. > > > > +1. I support 100% this new AIP. This is very much needed for event driven > scheduling but also for the other scenarios well described in this AIP. > > On 2026/03/23 08:19:35 Amogh Desai wrote: > > Thanks Vikram, XD, and Jake for the proposal. It covers all the angles I > > can think at > > the time being and I appreciate merging together the various patterns. > > > > The AIP really covers good breadth and depth too. +1 from me on this, > and I > > hope we > > can see this one in action. Happy to help with any efforts here. > > > > Thanks & Regards, > > Amogh Desai > > > > > > On Mon, Mar 23, 2026 at 4:45 AM Xiaodong Deng <[email protected]> wrote: > > > > > Thanks, Vikram, for helping bring all the efforts together. > > > And thanks, everyone, for your positive feedback. > > > > > > I have been discussing this proposal with Vikram offline, and I'm quite > > > confident it is going to resolve quite a few pending issues and > > > inconveniences in how people use Airflow, or at least help people avoid > > > unnecessary hacks. > > > > > > As far as I could see, it will be able to cover all the use cases I > > > brought up in my earlier draft AIP, "Add 'persist_xcom_through_retry' > > > Parameter to Airflow Operators" ( > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399278333) > . > > > That's also why I look forward even more to seeing this Task State > > > Management feature in a very near future version of Airflow. > > > > > > Would love to receive more feedback from the community. Vikram, Jake, > and > > > I are looking forward to working with everyone to bring this thrilling > > > feature to life. > > > > > > Regards, > > > XD > > > > > > On 2026/03/22 06:48:10 Rahul Vats wrote: > > > > Thanks, Vikram for bringing this up. > > > > > > > > A big +1 from me as well. The three patterns you mentioned are very > > > real, I > > > > have seen users stretch XCom in all sorts of ways to fill exactly > these > > > > gaps. > > > > The clean separation from XCom with different scoping and lifecycle > > > makes a > > > > lot of sense. Will go through the AIP doc in detail. > > > > > > > > Thanks, > > > > Rahul > > > > > > > > > > > > On Sun, 22 Mar 2026 at 01:39, Jens Scheffler <[email protected]> > > > wrote: > > > > > > > > > Thanks Vikram, Jake, XD also from my side! > > > > > > > > > > A big +1 for moving this forward and I think this is really > important. > > > > > Though from reading over it I do not see why it is marked as DRAFT, > > > > > because besides nt I think it is already very mature. All what I > saw is > > > > > in general "right". So I hope this is a not really controversional > > > > > discuss and then we can get this in 3.3! > > > > > > > > > > (Some could say this concept is overdue... but is important to > have!) > > > > > > > > > > Jens > > > > > > > > > > On 21.03.26 20:58, Jarek Potiuk wrote: > > > > > > Thanks Vikram, > > > > > > > > > > > > This is a crucial AIP for Airflow 3.3+. I skimmed through it and > will > > > > > > provide more comments over the coming days, but it very much > looks > > > like > > > > > > what I imagined for state management in Airflow. > > > > > > It has about the right abstraction layer, focusing on building > > > > > > infrastructure that serves the previously articulated - use > cases and > > > > > > likely supports other use cases we are not yet aware of. I really > > > like > > > > > how > > > > > > it maps the "generic" interface into those cases. > > > > > > > > > > > > I have this old "rule of thumb": you need at least three use > cases > > > to be > > > > > > able to design a truly reusable infrastructure API/component. .. > > > Here we > > > > > > have 3 use cases it will serve :) > > > > > > > > > > > > Jl > > > > > > > > > > > > > > > > > > On Sat, Mar 21, 2026 at 8:44 PM Vikram Koka via dev < > > > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > >> Dear Airflowers, > > > > > >> > > > > > >> Over the last several months, there have been a lot of > discussions > > > in > > > > > the > > > > > >> devlist around improvements needed for long running jobs > outside of > > > > > Airflow > > > > > >> (raised by XD and others), and about improved event triggering > > > (raised > > > > > by > > > > > >> Jake and others). XD, Jake, and I have gotten together and > > > collaborated > > > > > on > > > > > >> a unified approach for Task State Management within Airflow > which we > > > > > would > > > > > >> like to propose. > > > > > >> > > > > > >> Apache Airflow has been built around stateless, idempotent > tasks, > > > and > > > > > this > > > > > >> has served the community incredibly well. But as production AI > and > > > data > > > > > >> workloads have grown more sophisticated, a clear gap has emerged > > > that > > > > > the > > > > > >> community has been working around for a while. > > > > > >> > > > > > >> Three patterns keep coming up. An incremental operator needs to > know > > > > > where > > > > > >> it left off last time, so it does not reprocess data it has > already > > > > > >> handled. An operator running a Databricks or EMR job needs to > > > survive a > > > > > >> worker disruption without cancelling a job that was 90% > complete and > > > > > >> starting over from scratch. A long-running async task processing > > > > > thousands > > > > > >> of files needs to checkpoint its progress so a retry picks up > where > > > it > > > > > left > > > > > >> off, not from the beginning. > > > > > >> > > > > > >> All three patterns are forcing users into the same workarounds > today > > > > > >> generally bending XCom beyond its intended purpose, or building > > > their > > > > > own > > > > > >> state persistence outside of Airflow entirely. > > > > > >> > > > > > >> We think we can do better. AIP-XX: Task State Management is a > new > > > > > >> foundation AIP that addresses all three patterns through a > single, > > > > > minimal, > > > > > >> pluggable framework. Built on top of the Execution API from > AIP-72, > > > with > > > > > >> full async support consistent with AIP-98, Task State is > > > deliberately > > > > > and > > > > > >> cleanly separate from XCom, with different scoping, different > > > lifecycle > > > > > >> semantics, and different garbage collection mechanics. It also > > > provides > > > > > the > > > > > >> foundation for a simplified AIP-93 (Asset Watermarking) > > > > > >> < > > > > > >> > > > > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/%5BWIP%5D+AIP-93+Asset+Watermarks+and+State+Variables > > > > > >> and for long running remote operations using either the AIP-tbd > > > > > Persistent > > > > > >> Parameter for Airflow Operators > > > > > >> < > > > > > >> > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399278333 > > > > > >> or AIP-96 (Resumable Operators) > > > > > >> < > > > > > >> > > > > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-96+Resumable+Operators > > > > > >> . > > > > > >> > > > > > >> Full draft is on Confluence as Draft AIP-xx: Task State > Management > > > > > >> < > > > > > >> > > > > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/Draft%3A+AIP-xx%3A+Task+State+Management > > > > > >> We would love to hear your thoughts. Please comment on the AIP > doc. > > > > > >> > > > > > >> Best regards, > > > > > >> Vikram, XD, and Jake > > > > > >> -- > > > > > >> > > > > > >> Vikram Koka > > > > > >> Chief Strategy Officer > > > > > >> Email: [email protected] > > > > > >> > > > > > >> > > > > > >> <https://www.astronomer.io/> > > > > > >> > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: [email protected] > > > > > For additional commands, e-mail: [email protected] > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [email protected] > > > For additional commands, e-mail: [email protected] > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
