Thanks  for the reply.  I look forward to seeing how it all works out.

- ferruzzi

________________________________
From: Vikram Koka via dev <[email protected]>
Sent: Monday, March 30, 2026 7:28 PM
To: [email protected] <[email protected]>
Cc: Vikram Koka <[email protected]>
Subject: RE: [EXT] [DISCUSS] Task State Management

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne 
cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas 
confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le 
contenu ne présente aucun risque.



Hi Dennis,


Thank you for this. I want to address the workload generalization point
more precisely, because I think there is an important distinction that
clarifies
where AIP-103 sits.


AIP-103 is fundamentally a DAG-author-facing API.

The scopes it defines correspond directly to concepts that DAG authors
declare and interact with: tasks (via context['task_state']) and assets (via
context['asset_state']). Both TaskScope and AssetScope have clear natural
primary keys and well-understood DAG-author use cases.


However, they are built on a generic underlying mechanism that can support
additional scopes when the time is right, so we are not locked in and do
not need to decide on everything now.


BaseDagBundleWorkload and WorkloadKey are a different layer entirely. They
are internal to the scheduler and executor.

DAG authors do not declare workloads, do not interact with them, and have
no context['workload_state'] surface.

When a DAG author writes @task def my_func(), the scheduler creates an
ExecuteTask workload behind the scenes, but that is invisible to the
author.

The workload abstraction is the right place to unify executor mechanics,
but it is not the DAG-author API that AIP-103 is designing.


SyncCallback and AsyncCallback are DAG-author-visible, but a callback is a
one-shot function call with no cross-retry state persistence requirement.


I think the right approach is to wait until the callbacks and workloads
design stabilizes further. Once it does, we can identify the natural semantic
primary key for each workload type, assess the key use cases that need
state persistence from a DAG-author perspective, and add the appropriate
scope at that point.

Since the underlying task_state mechanism is generic, I am confident that
we can add additional scopes as needed.


Vikram



On Mon, Mar 23, 2026 at 5:08 PM Ferruzzi, Dennis <[email protected]>
wrote:

> I think this is a great direction, but I'd like to see it address
> Workloads in general, not just Tasks.  The executor workload abstraction
> already exists; `ExecuteTask`, `ExecuteCallback`, and `RunTrigger` are all
> concrete workload types with shared base classes for key, state,
> display_name, and bundle routing.  Once PR #63491 [1] is merged, a good
> chunk of the workflow will be unified between Task Instances and Executor
> Callbacks, and Sebastian is working on moving the Dag Processor callbacks
> over to the executor as well, [2] which will further extend the existing
> executor workload abstraction.  I think the new state system can be
> implemented in a way that all of those can take advantage of it.
>
> The `BaseDagBundleWorkload` base class already defines abstract
> `success_state`/`failure_state` properties and a `WorkloadState` type
> alias.  LocalExecutor and Celery Executor both already support them with
> every other executor already being adapted [3].  The main gap is that
> `BaseExecutor.change_state()` and a few other helpers are still hardcoded
> to `TaskInstanceKey`/`TaskInstanceState` which seems like exactly the kind
> of thing this AIP can address generically.
>
> If you think that's too big of a scope for this AIP, I'd like you to at
> least consider that it may be future work and attempt to design it in a way
> that it can later be extended to a Workload State Management.
>
>
> [1] https://github.com/apache/airflow/pull/63491
> [2] https://lists.apache.org/thread/o0z8v01v9qq26r6qmvx8zwbkmho1fnbg
> [3] https://github.com/apache/airflow/issues/62887
> <https://github.com/apache/airflow/issues/62887>
>
> - ferruzzi
>
> ------------------------------
> *From:* Vincent Beck <[email protected]>
> *Sent:* Monday, March 23, 2026 7:32 AM
> *To:* [email protected] <[email protected]>
> *Subject:* RE: [EXT] [DISCUSS] Task State Management
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez
> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que
> le contenu ne présente aucun risque.
>
>
>
> +1. I support 100% this new AIP. This is very much needed for event driven
> scheduling but also for the other scenarios well described in this AIP.
>
> On 2026/03/23 08:19:35 Amogh Desai wrote:
> > Thanks Vikram, XD, and Jake for the proposal. It covers all the angles I
> > can think at
> > the time being and I appreciate merging together the various patterns.
> >
> > The AIP really covers good breadth and depth too. +1 from me on this,
> and I
> > hope we
> > can see this one in action. Happy to help with any efforts here.
> >
> > Thanks & Regards,
> > Amogh Desai
> >
> >
> > On Mon, Mar 23, 2026 at 4:45 AM Xiaodong Deng <[email protected]> wrote:
> >
> > > Thanks, Vikram, for helping bring all the efforts together.
> > > And thanks, everyone, for your positive feedback.
> > >
> > > I have been discussing this proposal with Vikram offline, and I'm quite
> > > confident it is going to resolve quite a few pending issues and
> > > inconveniences in how people use Airflow, or at least help people avoid
> > > unnecessary hacks.
> > >
> > > As far as I could see, it will be able to cover all the use cases I
> > > brought up in my earlier draft AIP, "Add 'persist_xcom_through_retry'
> > > Parameter to Airflow Operators" (
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399278333)
> .
> > > That's also why I look forward even more to seeing this Task State
> > > Management feature in a very near future version of Airflow.
> > >
> > > Would love to receive more feedback from the community. Vikram, Jake,
> and
> > > I are looking forward to working with everyone to bring this thrilling
> > > feature to life.
> > >
> > > Regards,
> > > XD
> > >
> > > On 2026/03/22 06:48:10 Rahul Vats wrote:
> > > > Thanks, Vikram for bringing this up.
> > > >
> > > > A big +1 from me as well. The three patterns you mentioned are very
> > > real, I
> > > > have seen users stretch XCom in all sorts of ways to fill exactly
> these
> > > > gaps.
> > > > The clean separation from XCom with different scoping and lifecycle
> > > makes a
> > > > lot of sense. Will go through the AIP doc in detail.
> > > >
> > > > Thanks,
> > > > Rahul
> > > >
> > > >
> > > > On Sun, 22 Mar 2026 at 01:39, Jens Scheffler <[email protected]>
> > > wrote:
> > > >
> > > > > Thanks Vikram, Jake, XD also from my side!
> > > > >
> > > > > A big +1 for moving this forward and I think this is really
> important.
> > > > > Though from reading over it I do not see why it is marked as DRAFT,
> > > > > because besides nt I think it is already very mature. All what I
> saw is
> > > > > in general "right". So I hope this is a not really controversional
> > > > > discuss and then we can get this in 3.3!
> > > > >
> > > > > (Some could say this concept is overdue... but is important to
> have!)
> > > > >
> > > > > Jens
> > > > >
> > > > > On 21.03.26 20:58, Jarek Potiuk wrote:
> > > > > > Thanks Vikram,
> > > > > >
> > > > > > This is a crucial AIP for Airflow 3.3+. I skimmed through it and
> will
> > > > > > provide more comments over the coming days, but it very much
> looks
> > > like
> > > > > > what I imagined for state management in Airflow.
> > > > > > It has about the right abstraction layer, focusing on building
> > > > > > infrastructure that serves the previously articulated - use
> cases and
> > > > > > likely supports other use cases we are not yet aware of. I really
> > > like
> > > > > how
> > > > > > it maps the "generic" interface into those cases.
> > > > > >
> > > > > > I have this old "rule of thumb": you need at least three use
> cases
> > > to be
> > > > > > able to design a truly reusable infrastructure API/component. ..
> > > Here we
> > > > > > have 3 use cases it will serve :)
> > > > > >
> > > > > > Jl
> > > > > >
> > > > > >
> > > > > > On Sat, Mar 21, 2026 at 8:44 PM Vikram Koka via dev <
> > > > > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > >> Dear Airflowers,
> > > > > >>
> > > > > >> Over the last several months, there have been a lot of
> discussions
> > > in
> > > > > the
> > > > > >> devlist around improvements needed for long running jobs
> outside of
> > > > > Airflow
> > > > > >> (raised by XD and others), and about improved event triggering
> > > (raised
> > > > > by
> > > > > >> Jake and others). XD, Jake, and I have gotten together and
> > > collaborated
> > > > > on
> > > > > >> a unified approach for Task State Management within Airflow
> which we
> > > > > would
> > > > > >> like to propose.
> > > > > >>
> > > > > >> Apache Airflow has been built around stateless, idempotent
> tasks,
> > > and
> > > > > this
> > > > > >> has served the community incredibly well. But as production AI
> and
> > > data
> > > > > >> workloads have grown more sophisticated, a clear gap has emerged
> > > that
> > > > > the
> > > > > >> community has been working around for a while.
> > > > > >>
> > > > > >> Three patterns keep coming up. An incremental operator needs to
> know
> > > > > where
> > > > > >> it left off last time, so it does not reprocess data it has
> already
> > > > > >> handled. An operator running a Databricks or EMR job needs to
> > > survive a
> > > > > >> worker disruption without cancelling a job that was 90%
> complete and
> > > > > >> starting over from scratch. A long-running async task processing
> > > > > thousands
> > > > > >> of files needs to checkpoint its progress so a retry picks up
> where
> > > it
> > > > > left
> > > > > >> off, not from the beginning.
> > > > > >>
> > > > > >> All three patterns are forcing users into the same workarounds
> today
> > > > > >> generally bending XCom beyond its intended purpose, or building
> > > their
> > > > > own
> > > > > >> state persistence outside of Airflow entirely.
> > > > > >>
> > > > > >> We think we can do better. AIP-XX: Task State Management is a
> new
> > > > > >> foundation AIP that addresses all three patterns through a
> single,
> > > > > minimal,
> > > > > >> pluggable framework. Built on top of the Execution API from
> AIP-72,
> > > with
> > > > > >> full async support consistent with AIP-98, Task State is
> > > deliberately
> > > > > and
> > > > > >> cleanly separate from XCom, with different scoping, different
> > > lifecycle
> > > > > >> semantics, and different garbage collection mechanics. It also
> > > provides
> > > > > the
> > > > > >> foundation for a simplified AIP-93 (Asset Watermarking)
> > > > > >> <
> > > > > >>
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/AIRFLOW/%5BWIP%5D+AIP-93+Asset+Watermarks+and+State+Variables
> > > > > >> and for long running remote operations using either the AIP-tbd
> > > > > Persistent
> > > > > >> Parameter for Airflow Operators
> > > > > >> <
> > > > > >>
> > > > >
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399278333
> > > > > >> or AIP-96 (Resumable Operators)
> > > > > >> <
> > > > > >>
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-96+Resumable+Operators
> > > > > >> .
> > > > > >>
> > > > > >> Full draft is on Confluence as Draft AIP-xx: Task State
> Management
> > > > > >> <
> > > > > >>
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/AIRFLOW/Draft%3A+AIP-xx%3A+Task+State+Management
> > > > > >> We would love to hear your thoughts. Please comment on the AIP
> doc.
> > > > > >>
> > > > > >> Best regards,
> > > > > >> Vikram, XD, and Jake
> > > > > >> --
> > > > > >>
> > > > > >> Vikram Koka
> > > > > >> Chief Strategy Officer
> > > > > >> Email: [email protected]
> > > > > >>
> > > > > >>
> > > > > >> <https://www.astronomer.io/>
> > > > > >>
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: [email protected]
> > > > > For additional commands, e-mail: [email protected]
> > > > >
> > > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to