Re: [DISCUSS] AIP-92 Isolate DAG parsing logic

Jarek Potiuk Wed, 13 Aug 2025 01:38:51 -0700

Those are all questions that we will open up for discussion once we get the
basic foundation :)


On Wed, Aug 13, 2025 at 10:16 AM Sumit Maheshwari <[email protected]>
wrote:

> Awesome, having a standard approach for all kinds of authentication would
> be great, looking forward to it.
>
> BTW, on a side note, I see that as of now, things like Connections,
> Variables, and XComs which are present under Execution API namespace, don't
> have any authorization model (left with TODOs), so is there any plan how
> they will work, cause we might need something similar for
> Dag-processor/Triggerer as well.
>
> Also, as we've decided to create a diff API namespace and not use execution
> namespace for hosting new APIs required by dag-processor/triggerer, do
> we've to copy these classes & routes OR atleast refactor them, so they can
> serve all API namespaces.
>
> On Wed, Aug 13, 2025 at 1:27 PM Jarek Potiuk <[email protected]> wrote:
>
> > Will do. We are also discussing - for now - within the security team -
> the
> > various aspects of authentication approach we want to have - for both
> > "UI/User authentication" as well as "Long running services" and how they
> > relate to token exchange, invalidation and other scenarios. What we will
> > come up - I hope shortly - is a proposal of general "model" of all kinds
> of
> > security and authentication scenarios, so that we do not have to reinvent
> > the wheel and try to figure out all the aspects with individual AIPs. We
> > are not far from bringing it to the open discussion at devlist, but we
> have
> > to be careful about some of the aspects that we might need to improve in
> > the current setup to close some - small - loopholes so bear with us :)
> >
> > On Wed, Aug 13, 2025 at 9:33 AM Sumit Maheshwari <[email protected]
> >
> > wrote:
> >
> > > Thanks Ash and Jarek, for the detailed comments. I largely agree with
> the
> > > points mentioned by you guys, hence I updated the AIP and added a
> section
> > > on Authentication, API Versioning, and Packaging as well. Please go
> > through
> > > it once more and let me know if there are more things to consider
> before
> > I
> > > open it for voting.
> > >
> > >
> > > On Thu, Aug 7, 2025 at 6:03 PM Jarek Potiuk <[email protected]> wrote:
> > >
> > > > Also
> > > >
> > > > > 1. The Authentication token. How will this long lived token work
> > > without
> > > > being insecure. Who and what will generate it? How will we identify
> > > > top-level requests for Variables in order to be able to add Variable
> > > > RBAC/ACLs. This is an important enough thing that I think it needs
> > > > discussion before we vote on this AIP.
> > > >
> > > > We are currently discussing - in the security team - approach for JWT
> > > token
> > > > handling, so likely we could move the discussion there, it does have
> > some
> > > > security implications and I think we should bring our finding to the
> > > > devlist when we complete it, but I think we should add this case
> there.
> > > > IMHO we should have a different approach for UI, different for Tasks,
> > > > different for Triggerer, and different for DagProcessor. (possibly
> the
> > > > Trigerer and DagProcessor could be the same because they share
> > > essentially
> > > > the same long-living token. Ash - I will add this to the discussion
> > > there.
> > > >
> > > > J.
> > > >
> > > >
> > > >
> > > > On Thu, Aug 7, 2025 at 2:23 PM Jarek Potiuk <[email protected]>
> wrote:
> > > >
> > > > > Ah.. So if we are talking about a more complete approach - seeing
> > those
> > > > > comments from Ash - make me think if we should have another AIP.
> > > > > (connected) about splitting the distributions. We have never
> > finalized
> > > it
> > > > > (nor even discussed id) but Ash - you had some initial document for
> > > that.
> > > > > So maybe we should finalize it and rather than specify it in this
> > AIP -
> > > > > have a separate AIP about distribution split that AIP-92 could
> depend
> > > on.
> > > > > It seems much more reasonable to split "distribution and code
> split"
> > > from
> > > > > parsing isolation I think and implement them separately/in
> parallel.
> > > > >
> > > > > Reading Ash comments (and maybe I am going a bit further than Ash)
> it
> > > > > calls for something that I am a big proponent of - splitting
> > > > "airflow-core"
> > > > > and having a different "scheduler". "webserver", "dag processor"
> and
> > > > > "triggerer" distributions. Now - we have the capability of having
> > > > "shared"
> > > > > code - we do not need "common" code to make it happen - because we
> > can
> > > > > share code.
> > > > >
> > > > > What it could give us - on top of clean client/server split, we
> could
> > > > have
> > > > > different dependencies used by those distributions. Additionally,
> we
> > > > could
> > > > > also split-off the executors from providers and finally implement
> it
> > in
> > > > the
> > > > > way that scheduler does not use providers at all (not even
> > > > cncf.kubernetes
> > > > > nor celery providers installed in scheduler nor webserver but
> > > "executors"
> > > > > packages instead. The code sharing approach with symlinks we have
> now
> > > > will
> > > > > make it a .... breeze :) . That would also imply sharing
> "connection"
> > > > > definitions through DB, and likely implementing "test connection"
> > > > > feature properly finally (i.e executing test connection in worker /
> > > > > triggerer rather than in web server which is a reason why we
> disabled
> > > it
> > > > by
> > > > > default now). This way "api-server" would not need any of the
> > providers
> > > > to
> > > > > be installed either which IMHO is the biggest win from a security
> > point
> > > > of
> > > > > view.
> > > > >
> > > > > And the nice thing about it is that it would be rather transparent
> > when
> > > > > anyone uses "pip install apache-airflow" - it would behave exactly
> > the
> > > > > same, no more complexity involved, simply more distributions
> > installed
> > > > when
> > > > > 'apache-airflow" meta-distribution is used, but it would allow
> those
> > > who
> > > > > want to implement a more complex and secure setup to have different
> > > > > "environments" with modularized pieces of airflow installed - only
> > > > > "apache-airflow-dag-processor + task-sdk + providers" where
> > > dag-processor
> > > > > is run, only "apache-airflow-scheduler + executors" where scheduler
> > is
> > > > > installed only "apache-airflow-task-sdk + providers" where workers
> > are
> > > > > running, only "apache-airflow-api-server" where api-server is
> running
> > > and
> > > > > only "apache-airflow-trigger + task-sdk + providers" .
> > > > >
> > > > > I am happy (Ash If you are fine with that) to take that original
> > > document
> > > > > over and lead this part and new AIP to completion (including
> > > > > implementation), I am very much convinced that this will lead to
> much
> > > > > better dependency security and more modular code without impacting
> > the
> > > > > "apache-airflow" installation complexity.
> > > > >
> > > > > If we do it this way- the part of code/clean split would be
> > "delegated
> > > > > out" from AIP-92 to this new AIP and turned into dependency.
> > > > >
> > > > > J.
> > > > >
> > > > >
> > > > > On Thu, Aug 7, 2025 at 1:51 PM Ash Berlin-Taylor <[email protected]>
> > > wrote:
> > > > >
> > > > >> This AIP is definitely heading in the right direction and is a
> > feature
> > > > >> I’d like to see.
> > > > >>
> > > > >> For me the outstanding things that need more detail:
> > > > >>
> > > > >> 1. The Authentication token. How will this long lived token work
> > > without
> > > > >> being insecure. Who and what will generate it? How will we
> identify
> > > > >> top-level requests for Variables in order to be able to add
> Variable
> > > > >> RBAC/ACLs. This is an important enough thing that I think it needs
> > > > >> discussion before we vote on this AIP.
> > > > >> 2. Security generally — how will this work, especially with the
> > > > >> multi-team? I think this likely means making the APIs work on the
> > > bundle
> > > > >> level as you mention in the doc, but I haven’t thought deeply
> about
> > > this
> > > > >> yet.
> > > > >> 3. API Versioning? One of the the key driving goals with AIP-72
> and
> > > the
> > > > >> Task Execution SDK was the idea that “you can upgrade the API
> server
> > > as
> > > > you
> > > > >> like, and your clients/workers never need to work” — i.e. the API
> > > > server is
> > > > >> 100% working with all older versions of the TaskSDK. I don’t know
> if
> > > we
> > > > >> will achieve that goal in the long run but it is the desire, and
> > part
> > > of
> > > > >> why we are using CalVer and the Cadwyn library to provide API
> > > > versioning.
> > > > >> 4. As mentioned previously, not sure the existing serialised JSON
> > > format
> > > > >> for DAGs is correct, but since that now has version and we already
> > > have
> > > > the
> > > > >> ability to upgrade that somewhere in the Airflow Core that doesn’t
> > > > >> necessarily become a blocker/pre-requisite for this AIP.
> > > > >>
> > > > >> I think Dag parsing API client+submission+parsing process manager
> > > should
> > > > >> either live in the Task SDK dist, or in a new separate dist that
> > uses
> > > > >> TaskSDK, but crucially not in apache-airflow-core. My reason for
> > this
> > > is
> > > > >> that I want it to be possible for the server components
> (scheduler,
> > > API
> > > > >> server) to not need task-sdk installed (just for
> > cleanliness/avoiding
> > > > >> confusion about what versions it needs) and also vice-verse, to be
> > > able
> > > > to
> > > > >> run a “team worker bundle” (Dag parsing, workers, triggered/async
> > > > workers)
> > > > >> on whatever version of TaskSDK they choose, again without
> > > > >> apache-airflow-core installed for avoidance of doubt.
> > > > >>
> > > > >> Generally I would like this as it means we can have a nicer
> > separation
> > > > of
> > > > >> Core and Dag parsing code, as the dag parsing itself uses the SDK,
> > it
> > > > would
> > > > >> be nice to have a proper server/client split, both from a tighter
> > > > security
> > > > >> point-of-view, but also from a code layout point of view.
> > > > >>
> > > > >> -ash
> > > > >>
> > > > >>
> > > > >> > On 7 Aug 2025, at 12:36, Jarek Potiuk <[email protected]> wrote:
> > > > >> >
> > > > >> > Well, you started it - so it's up to you to decide if you think
> we
> > > > have
> > > > >> > consensus, or whether we need a vote.
> > > > >> >
> > > > >> > And It's not a question of "informal" vote but it's rather clear
> > > > >> following
> > > > >> > the https://www.apache.org/foundation/voting.html that we
> either
> > > > need a
> > > > >> > LAZY CONSENSUS or VOTE thread. Both are formal.
> > > > >> >
> > > > >> > This is the difficult part when you have a proposal, to assess
> (by
> > > > you)
> > > > >> > whether we are converging to consensus or whether vote is
> needed.
> > > > There
> > > > >> is
> > > > >> > no other body or "authority" to do it for you.
> > > > >> >
> > > > >> > J.
> > > > >> >
> > > > >> > On Thu, Aug 7, 2025 at 1:02 PM Sumit Maheshwari <
> > > > [email protected]
> > > > >> >
> > > > >> > wrote:
> > > > >> >
> > > > >> >> Sorry for nudging again, but can we get into some consensus on
> > > this?
> > > > I
> > > > >> mean
> > > > >> >> if this AIP isn't good enough, then we can drop it altogether
> and
> > > > >> someone
> > > > >> >> can rethink the whole thing. Should we do some kind of informal
> > > > voting
> > > > >> and
> > > > >> >> close this thread?
> > > > >> >>
> > > > >> >> On Mon, Aug 4, 2025 at 3:32 PM Jarek Potiuk <[email protected]>
> > > > wrote:
> > > > >> >>
> > > > >> >>>>> My main concern with this right now is the serialisation
> > format
> > > of
> > > > >> >> DAGs
> > > > >> >>> —
> > > > >> >>>> it wasn’t really designed with remote submission in mind, so
> it
> > > > need
> > > > >> >> some
> > > > >> >>>> careful examination to see if it is fit for this purpose or
> > not.
> > > > >> >>>
> > > > >> >>> I understand Ash's concerns - the format has not been designed
> > > with
> > > > >> >>> size/speed optimization in mind so **possibly** we could
> design
> > a
> > > > >> >> different
> > > > >> >>> format that would be better suited.
> > > > >> >>>
> > > > >> >>> BUT  ... Done is better than perfect.
> > > > >> >>>
> > > > >> >>> I think there are a number of risks involved in changing the
> > > format
> > > > >> and
> > > > >> >> it
> > > > >> >>> could significantly increase time of development with
> uncertain
> > > > gains
> > > > >> at
> > > > >> >>> the end - also because of the progress in compression that
> > > happened
> > > > >> over
> > > > >> >>> the last few years.
> > > > >> >>>
> > > > >> >>> It might be a good idea to experiment a bit with different
> > > > compression
> > > > >> >>> algorithms for "our" dag representation and possibly we could
> > find
> > > > the
> > > > >> >> best
> > > > >> >>> algorithm for "airflow dag" type of json data. There are a lot
> > of
> > > > >> >>> repetitions in the JSON representation and I guess in "our"
> json
> > > > >> >>> representation there are some artifacts and repeated sections
> > that
> > > > >> simply
> > > > >> >>> might compress well with different algorithms. Also in this
> case
> > > > >> >>> speed matters (and CPU trade-off).
> > > > >> >>>
> > > > >> >>> Looking at compression "theory" - before we experiment with
> it -
> > > > >> there is
> > > > >> >>> the relatively new standard "zstandard"
> > > > >> https://github.com/facebook/zstd
> > > > >> >>> compression opensourced in 2016 which I've heard good things
> > > about -
> > > > >> >>> especially that it maintains a very good compression rate for
> > text
> > > > >> data,
> > > > >> >>> but also it is tremendously fast - especially for
> decompression
> > > > >> (which is
> > > > >> >>> super important factor for us - we compress new DAG
> > representation
> > > > far
> > > > >> >> less
> > > > >> >>> often than decompress it in general case). It is standardized
> in
> > > RFC
> > > > >> >>> https://datatracker.ietf.org/doc/html/rfc8878 and there are
> > > various
> > > > >> >>> implementations and it is even being added to Python standard
> > > > library
> > > > >> in
> > > > >> >>> Python 3.14
> > > > >> https://docs.python.org/3.14/library/compression.zstd.html
> > > > >> >> and
> > > > >> >>> there is a very well maintained python binding library
> > > > >> >>> https://pypi.org/project/zstd/ to Yann Collet (algorithm
> > author)
> > > > >> ZSTD C
> > > > >> >>> library. And libzstd is already part of our images - it is
> > needed
> > > by
> > > > >> >> other
> > > > >> >>> dependencies of ours. All with BSD licence, directly usable by
> > us.
> > > > >> >>>
> > > > >> >>> I think this one might be a good candidate for us to try, and
> > > > possibly
> > > > >> >> with
> > > > >> >>> zstd we could achieve both size and CPU overhead that would be
> > > > >> comparable
> > > > >> >>> with any "new" format we could come up with - especially that
> we
> > > are
> > > > >> >>> talking merely about processing a huge blob between "storable"
> > > > >> >> (compressed)
> > > > >> >>> and "locally usable" state (Python dict). We could likely use
> a
> > > > >> streaming
> > > > >> >>> JSON library (say the one that is used in Pydantic internally
> > > > >> >>> https://github.com/pydantic/jiter - we already have it as
> part
> > of
> > > > >> >>> Pydantic)
> > > > >> >>> to also save memory - we could stream decompressed stream into
> > > > jitter
> > > > >> so
> > > > >> >>> that both the json dict and string representation does not
> have
> > to
> > > > be
> > > > >> >>> loaded fully in memory at the same time. There are likely lots
> > of
> > > > >> >>> optimisations we could do - I mentioned possibly streaming the
> > > data
> > > > >> from
> > > > >> >>> API directly to DB (if this is possible - not sure)
> > > > >> >>>
> > > > >> >>> J.
> > > > >> >>>
> > > > >> >>>
> > > > >> >>> On Mon, Aug 4, 2025 at 9:10 AM Sumit Maheshwari <
> > > > >> [email protected]>
> > > > >> >>> wrote:
> > > > >> >>>
> > > > >> >>>>>
> > > > >> >>>>> My main concern with this right now is the serialisation
> > format
> > > of
> > > > >> >>> DAGs —
> > > > >> >>>>> it wasn’t really designed with remote submission in mind, so
> > it
> > > > need
> > > > >> >>> some
> > > > >> >>>>> careful examination to see if it is fit for this purpose or
> > not.
> > > > >> >>>>>
> > > > >> >>>>
> > > > >> >>>> I'm not sure on this point, cause if we are able to convert a
> > DAG
> > > > >> into
> > > > >> >>>> JSON, then it has to be transferable over the internet.
> > > > >> >>>>
> > > > >> >>>> In particular One of the things I worry about is that the
> JSON
> > > can
> > > > >> get
> > > > >> >>> huge
> > > > >> >>>>> — I’ve seem this as large as 10-20Mb for some dags
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>> Yeah, agree on this, thats why we can transfer compressed
> data
> > > > >> instead
> > > > >> >> of
> > > > >> >>>> real json. Of course, this won't guarantee that the payload
> > will
> > > > >> always
> > > > >> >>> be
> > > > >> >>>> small enough, but we can't say that it'll definitely happen
> > > either.
> > > > >> >>>>
> > > > >> >>>> I also wonder if as part of this proposal we should move the
> > > > Callback
> > > > >> >>>>> requests off the dag parsers and on to the workers instead
> > > > >> >>>>
> > > > >> >>>> let's make such a "workfload" implementation stream that
> could
> > > > >> support
> > > > >> >>> both
> > > > >> >>>>> - Deadlines and DAG parsing logic
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>> I don't have any strong opinion here, but it feels like it's
> > > gonna
> > > > >> blow
> > > > >> >>> up
> > > > >> >>>> the scope of the AIP too much.
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>> On Fri, Aug 1, 2025 at 2:27 AM Jarek Potiuk <
> [email protected]>
> > > > >> wrote:
> > > > >> >>>>
> > > > >> >>>>>> My main concern with this right now is the serialisation
> > format
> > > > of
> > > > >> >>>> DAGs —
> > > > >> >>>>> it wasn’t really designed with remote submission in mind, so
> > it
> > > > need
> > > > >> >>> some
> > > > >> >>>>> careful examination to see if it is fit for this purpose or
> > not.
> > > > >> >>>>>
> > > > >> >>>>> Yep. That might be potentially a problem (or at least "need
> > more
> > > > >> >>>> resources
> > > > >> >>>>> to run airflow") and that is where my "2x memory" came from
> if
> > > we
> > > > do
> > > > >> >> it
> > > > >> >>>> in
> > > > >> >>>>> a trivial way. Currently we a) keep the whole DAG in memory
> > when
> > > > >> >>>>> serializing it b) submit it to database (also using
> > essentially
> > > > some
> > > > >> >>> kind
> > > > >> >>>>> of API (implemented by the database client) - so we know the
> > > whole
> > > > >> >>> thing
> > > > >> >>>>> "might work" but indeed if you use a trivial implementation
> of
> > > > >> >>> submitting
> > > > >> >>>>> the whole json - it basically means that the whole json will
> > > have
> > > > to
> > > > >> >>> also
> > > > >> >>>>> be kept in the memory of API server. But we also compress it
> > > when
> > > > >> >>> needed
> > > > >> >>>> -
> > > > >> >>>>> I wonder what are the compression ratios we saw with those
> > > > 10-20MBs
> > > > >> >>> Dags
> > > > >> >>>> -
> > > > >> >>>>> if the problem is using strings where bool would suffice,
> > > > >> compression
> > > > >> >>>>> should generally help a lot. We could only ever send
> > compressed
> > > > data
> > > > >> >>> over
> > > > >> >>>>> the API - there seems to be no need to send "plain JSON"
> data
> > > over
> > > > >> >> the
> > > > >> >>>> API
> > > > >> >>>>> or storing the plain JSON in the DB (of course that trades
> > > memory
> > > > >> for
> > > > >> >>>> CPU).
> > > > >> >>>>>
> > > > >> >>>>> I wonder if sqlalchemy 2 (and drivers for MySQL/Postgres)
> have
> > > > >> >> support
> > > > >> >>>> for
> > > > >> >>>>> any kind if binary data streaming - because that could help
> a
> > > lot
> > > > of
> > > > >> >> if
> > > > >> >>>> we
> > > > >> >>>>> could use streaming HTTP API and chunk and append the binary
> > > > chunks
> > > > >> >>> (when
> > > > >> >>>>> writing) - or read data in chunks ans stream them back via
> the
> > > > API.
> > > > >> >>> That
> > > > >> >>>>> could seriously decrease the amount of memory needed by the
> > API
> > > > >> >> server
> > > > >> >>> to
> > > > >> >>>>> process such huge serialized dags.
> > > > >> >>>>>
> > > > >> >>>>> And yeah - I would also love the "execute task" to be
> > > implemented
> > > > >> >> here
> > > > >> >>> -
> > > > >> >>>>> but I am not sure if this should be part of the same effort
> or
> > > > maybe
> > > > >> >> a
> > > > >> >>>>> separate implementation? That sounds very loosely coupled
> with
> > > DB
> > > > >> >>>>> isolation. And it seems a common theme - I think that would
> > also
> > > > >> make
> > > > >> >>> the
> > > > >> >>>>> sync Deadline alerts case that we discussed at the dev call
> > > > today. I
> > > > >> >>>> wonder
> > > > >> >>>>> if that should not be kind of parallel (let's make such a
> > > > >> "workfload"
> > > > >> >>>>> implementation stream that could support both - Deadlines
> and
> > > DAG
> > > > >> >>> parsing
> > > > >> >>>>> logic. We have already two "users" for it and I really love
> > the
> > > > >> >> saying
> > > > >> >>>> "if
> > > > >> >>>>> you want to make something reusable - make it usable
> first"  -
> > > > seems
> > > > >> >>> like
> > > > >> >>>>> we might have good opportunity to make such workload
> > > > implementation
> > > > >> >>>> "doubly
> > > > >> >>>>> used"  from the beginning which would increase chances it
> will
> > > be
> > > > >> >>>>> "reusable" for other things as well :).
> > > > >> >>>>>
> > > > >> >>>>> J.
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> On Thu, Jul 31, 2025 at 12:28 PM Ash Berlin-Taylor <
> > > > [email protected]>
> > > > >> >>>> wrote:
> > > > >> >>>>>
> > > > >> >>>>>> My main concern with this right now is the serialisation
> > format
> > > > of
> > > > >> >>>> DAGs —
> > > > >> >>>>>> it wasn’t really designed with remote submission in mind,
> so
> > it
> > > > >> >> need
> > > > >> >>>> some
> > > > >> >>>>>> careful examination to see if it is fit for this purpose or
> > > not.
> > > > >> >>>>>>
> > > > >> >>>>>> In particular One of the things I worry about is that the
> > JSON
> > > > can
> > > > >> >>> get
> > > > >> >>>>>> huge — I’ve seem this as large as 10-20Mb for some dags(!!)
> > > > (which
> > > > >> >> is
> > > > >> >>>>>> likely due to things being included as text when a bool
> might
> > > > >> >>> suffice,
> > > > >> >>>>> for
> > > > >> >>>>>> example) But I don’t think “just submit the existing JSON
> > over
> > > an
> > > > >> >>> API”
> > > > >> >>>>> is a
> > > > >> >>>>>> good idea.
> > > > >> >>>>>>
> > > > >> >>>>>> I also wonder if as part of this proposal we should move
> the
> > > > >> >> Callback
> > > > >> >>>>>> requests off the dag parsers and on to the workers instead
> —
> > in
> > > > >> >>> AIP-72
> > > > >> >>>> we
> > > > >> >>>>>> introduced the concept of a Workload, with the only one
> > > existing
> > > > >> >>> right
> > > > >> >>>>> now
> > > > >> >>>>>> is “ExecuteTask”
> > > > >> >>>>>>
> > > > >> >>>>>
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/airflow/blob/8e1201c7713d5c677fa6f6d48bbd4f6903505f61/airflow-core/src/airflow/executors/workloads.py#L87-L88
> > > > >> >>>>>> — it might be time to finally move task and dag callbacks
> to
> > > the
> > > > >> >> same
> > > > >> >>>>> thing
> > > > >> >>>>>> and make dag parsers only responsible for, well, parsing.
> :)
> > > > >> >>>>>>
> > > > >> >>>>>> These are all solvable problems, and this will be a great
> > > feature
> > > > >> >> to
> > > > >> >>>>> have,
> > > > >> >>>>>> but we need to do some more thinking and planning first.
> > > > >> >>>>>>
> > > > >> >>>>>> -ash
> > > > >> >>>>>>
> > > > >> >>>>>>> On 31 Jul 2025, at 10:12, Sumit Maheshwari <
> > > > >> >> [email protected]
> > > > >> >>>>
> > > > >> >>>>>> wrote:
> > > > >> >>>>>>>
> > > > >> >>>>>>> Gentle reminder for everyone to review the proposal.
> > > > >> >>>>>>>
> > > > >> >>>>>>> Updated link:
> > > > >> >>>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/%5BWIP%5D+AIP-92+Isolate+DAG+processor%2C+Callback+processor%2C+and+Triggerer+from+core+services
> > > > >> >>>>>>>
> > > > >> >>>>>>> On Tue, Jul 29, 2025 at 4:37 PM Sumit Maheshwari <
> > > > >> >>>>> [email protected]
> > > > >> >>>>>>>
> > > > >> >>>>>>> wrote:
> > > > >> >>>>>>>
> > > > >> >>>>>>>> Thanks everyone for reviewing this AIP. As Jarek and
> others
> > > > >> >>>>> suggested, I
> > > > >> >>>>>>>> expanded the scope of this AIP and divided it into three
> > > > phases.
> > > > >> >>>> With
> > > > >> >>>>>> the
> > > > >> >>>>>>>> increased scope, the boundary line between this AIP and
> > > AIP-85
> > > > >> >>> got a
> > > > >> >>>>>> little
> > > > >> >>>>>>>> thinner, but I believe these are still two different
> > > > >> >> enhancements
> > > > >> >>> to
> > > > >> >>>>>> make.
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> On Fri, Jul 25, 2025 at 10:51 PM Sumit Maheshwari <
> > > > >> >>>>>> [email protected]>
> > > > >> >>>>>>>> wrote:
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>> Yeah, overall it makes sense to include Triggers as well
> > to
> > > be
> > > > >> >>> part
> > > > >> >>>>> of
> > > > >> >>>>>>>>> this AIP and phase out the implementation. Though I
> didn't
> > > > >> >>> exclude
> > > > >> >>>>>> Triggers
> > > > >> >>>>>>>>> because "Uber" doesn't need that, I just thought of
> > keeping
> > > > the
> > > > >> >>>> scope
> > > > >> >>>>>> of
> > > > >> >>>>>>>>> development small and achieving them, just like it was
> > done
> > > in
> > > > >> >>>>> Airlfow
> > > > >> >>>>>> 3 by
> > > > >> >>>>>>>>> secluding only workers and not DAG-processor & Triggers.
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> But if you think Triggers should be part of this AIP
> > itself,
> > > > >> >>> then I
> > > > >> >>>>> can
> > > > >> >>>>>>>>> do that and include Triggers as well in it.
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> On Fri, Jul 25, 2025 at 7:34 PM Jarek Potiuk <
> > > > [email protected]
> > > > >> >>>
> > > > >> >>>>> wrote:
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>>> I would very much prefer the architectural choices of
> > this
> > > > AIP
> > > > >> >>> are
> > > > >> >>>>>> based
> > > > >> >>>>>>>>>> on
> > > > >> >>>>>>>>>> "general public" needs rather than "Uber needs" even if
> > > Uber
> > > > >> >>> will
> > > > >> >>>> be
> > > > >> >>>>>>>>>> implementing it - so from my point of view having
> Trigger
> > > > >> >>>> separation
> > > > >> >>>>>> as
> > > > >> >>>>>>>>>> part of it is quite important.
> > > > >> >>>>>>>>>>
> > > > >> >>>>>>>>>> But that's not even this.
> > > > >> >>>>>>>>>>
> > > > >> >>>>>>>>>> We've been discussing for example for Deadlines (being
> > > > >> >>> implemented
> > > > >> >>>>> by
> > > > >> >>>>>>>>>> Dennis and Ramit   a possibility of short,
> > > notification-style
> > > > >> >>>>>> "deadlines"
> > > > >> >>>>>>>>>> to be send to triggerer for execution - this is well
> > > advanced
> > > > >> >>> now,
> > > > >> >>>>> and
> > > > >> >>>>>>>>>> whether you want it or not Dag-provided code might be
> > > > >> >> serialized
> > > > >> >>>> and
> > > > >> >>>>>> sent
> > > > >> >>>>>>>>>> to triggerer for execution. This is part of our
> "broader"
> > > > >> >>>>>> architectural
> > > > >> >>>>>>>>>> change where we treat "workers" and "triggerer"
> similarly
> > > as
> > > > a
> > > > >> >>>>> general
> > > > >> >>>>>>>>>> executors of "sync" and "async" tasks respectively.
> > That's
> > > > >> >> where
> > > > >> >>>>>> Airflow
> > > > >> >>>>>>>>>> is
> > > > >> >>>>>>>>>> evolving towards - inevitably.
> > > > >> >>>>>>>>>>
> > > > >> >>>>>>>>>> But we can of course phase things in out for
> > > implementation -
> > > > >> >>> even
> > > > >> >>>>> if
> > > > >> >>>>>> AIP
> > > > >> >>>>>>>>>> should cover both, I think if the goal of the AIP and
> > > > preamble
> > > > >> >>> is
> > > > >> >>>>>> about
> > > > >> >>>>>>>>>> separating "user code" from "database" as the main
> > reason,
> > > it
> > > > >> >>> also
> > > > >> >>>>>> means
> > > > >> >>>>>>>>>> Triggerer if you ask me (from design point of view at
> > > least).
> > > > >> >>>>>>>>>>
> > > > >> >>>>>>>>>> Again implementation can be phased and even different
> > > people
> > > > >> >> and
> > > > >> >>>>> teams
> > > > >> >>>>>>>>>> might work on those phases/pieces.
> > > > >> >>>>>>>>>>
> > > > >> >>>>>>>>>> J.
> > > > >> >>>>>>>>>>
> > > > >> >>>>>>>>>> On Fri, Jul 25, 2025 at 2:29 PM Sumit Maheshwari <
> > > > >> >>>>>> [email protected]
> > > > >> >>>>>>>>>>>
> > > > >> >>>>>>>>>> wrote:
> > > > >> >>>>>>>>>>
> > > > >> >>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>> #2. Yeah, we would need something similar for
> > triggerers
> > > > as
> > > > >> >>>> well,
> > > > >> >>>>>>>>>> but
> > > > >> >>>>>>>>>>>> that
> > > > >> >>>>>>>>>>>> can be done as part of a different AIP
> > > > >> >>>>>>>>>>>
> > > > >> >>>>>>>>>>>
> > > > >> >>>>>>>>>>> You won't achieve your goal of "true" isolation of
> user
> > > code
> > > > >> >> if
> > > > >> >>>> you
> > > > >> >>>>>>>>>> don't
> > > > >> >>>>>>>>>>>> do triggerer. I think if the goal is to achieve it -
> it
> > > > >> >> should
> > > > >> >>>>> cover
> > > > >> >>>>>>>>>>> both.
> > > > >> >>>>>>>>>>>
> > > > >> >>>>>>>>>>>
> > > > >> >>>>>>>>>>> My bad, should've explained our architecture for
> > triggers
> > > as
> > > > >> >>>> well,
> > > > >> >>>>>>>>>>> apologies. So here it is:
> > > > >> >>>>>>>>>>>
> > > > >> >>>>>>>>>>>
> > > > >> >>>>>>>>>>>  - Triggers would be running on a centralized service,
> > so
> > > > >> >> all
> > > > >> >>>> the
> > > > >> >>>>>>>>>> Trigger
> > > > >> >>>>>>>>>>>  classes will be part of the platform team's repo and
> > not
> > > > >> >> the
> > > > >> >>>>>>>>>> customer's
> > > > >> >>>>>>>>>>> repo
> > > > >> >>>>>>>>>>>  - The triggers won't be able to use any libs other
> than
> > > std
> > > > >> >>>> ones,
> > > > >> >>>>>>>>>> which
> > > > >> >>>>>>>>>>>  are being used in core Airflow (like requests, etc)
> > > > >> >>>>>>>>>>>  - As we are the owners of the core Airflow repo,
> > > customers
> > > > >> >>> have
> > > > >> >>>>> to
> > > > >> >>>>>>>>>> get
> > > > >> >>>>>>>>>>>  our approval to land any class in this path (unlike
> the
> > > > >> >> dags
> > > > >> >>>> repo
> > > > >> >>>>>>>>>> which
> > > > >> >>>>>>>>>>>  they own)
> > > > >> >>>>>>>>>>>  - When a customer's task defer, we would have an
> > > allowlist
> > > > >> >> on
> > > > >> >>>> our
> > > > >> >>>>>>>>>> side
> > > > >> >>>>>>>>>>>  to check if we should do the async polling or not
> > > > >> >>>>>>>>>>>  - If the Trigger class isn't part of our repo
> > > (allowlist),
> > > > >> >>> just
> > > > >> >>>>>>>>>> fail the
> > > > >> >>>>>>>>>>>  task, as anyway we won't be having the code that they
> > > used
> > > > >> >> in
> > > > >> >>>> the
> > > > >> >>>>>>>>>>> trigger
> > > > >> >>>>>>>>>>>  class
> > > > >> >>>>>>>>>>>  - If any of these conditions aren't suitable for you
> > (as
> > > a
> > > > >> >>>>>>>>>> customer),
> > > > >> >>>>>>>>>>>  feel free to use sync tasks only
> > > > >> >>>>>>>>>>>
> > > > >> >>>>>>>>>>>
> > > > >> >>>>>>>>>>> But in general, I agree to make triggerer svc also
> > > > >> >> communicate
> > > > >> >>>> over
> > > > >> >>>>>>>>>> apis
> > > > >> >>>>>>>>>>> only. If that is done, then we can have instances of
> > > > >> >> triggerer
> > > > >> >>>> svc
> > > > >> >>>>>>>>>> running
> > > > >> >>>>>>>>>>> at customer's side as well, which can process any type
> > of
> > > > >> >>> trigger
> > > > >> >>>>>>>>>> class.
> > > > >> >>>>>>>>>>> Though that's not a blocker for us at the moment,
> cause
> > > > >> >>> triggerer
> > > > >> >>>>> are
> > > > >> >>>>>>>>>>> mostly doing just polling using simple libs like
> > requests.
> > > > >> >>>>>>>>>>>
> > > > >> >>>>>>>>>>>
> > > > >> >>>>>>>>>>>
> > > > >> >>>>>>>>>>> On Fri, Jul 25, 2025 at 5:03 PM Igor Kholopov
> > > > >> >>>>>>>>>> <[email protected]
> > > > >> >>>>>>>>>>>>
> > > > >> >>>>>>>>>>> wrote:
> > > > >> >>>>>>>>>>>
> > > > >> >>>>>>>>>>>> Thanks Sumit for the detailed proposal. Overall I
> > believe
> > > > it
> > > > >> >>>>> aligns
> > > > >> >>>>>>>>>> well
> > > > >> >>>>>>>>>>>> with the goals of making Airflow well-scalable
> beyond a
> > > > >> >>>>> single-team
> > > > >> >>>>>>>>>>>> deployment (and AIP-85 goals), so you have my full
> > > support
> > > > >> >>> with
> > > > >> >>>>> this
> > > > >> >>>>>>>>>> one.
> > > > >> >>>>>>>>>>>>
> > > > >> >>>>>>>>>>>> I've left a couple of clarification requests on the
> AIP
> > > > >> >> page.
> > > > >> >>>>>>>>>>>>
> > > > >> >>>>>>>>>>>> Thanks,
> > > > >> >>>>>>>>>>>> Igor
> > > > >> >>>>>>>>>>>>
> > > > >> >>>>>>>>>>>> On Fri, Jul 25, 2025 at 11:50 AM Sumit Maheshwari <
> > > > >> >>>>>>>>>>> [email protected]>
> > > > >> >>>>>>>>>>>> wrote:
> > > > >> >>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>> Thanks Jarek and Ash, for the initial review. It's
> > good
> > > to
> > > > >> >>> know
> > > > >> >>>>>>>>>> that
> > > > >> >>>>>>>>>>> the
> > > > >> >>>>>>>>>>>>> DAG processor has some preemptive measures in place
> to
> > > > >> >>> prevent
> > > > >> >>>>>>>>>> access
> > > > >> >>>>>>>>>>>>> to the DB. However, the main issue we are trying to
> > > solve
> > > > >> >> is
> > > > >> >>>> not
> > > > >> >>>>> to
> > > > >> >>>>>>>>>>>> provide
> > > > >> >>>>>>>>>>>>> DB creds to the customer teams, who are using
> Airflow
> > > as a
> > > > >> >>>>>>>>>> multi-tenant
> > > > >> >>>>>>>>>>>>> orchestration platform. I've updated the doc to
> > reflect
> > > > >> >> this
> > > > >> >>>>> point
> > > > >> >>>>>>>>>> as
> > > > >> >>>>>>>>>>>> well.
> > > > >> >>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>> Answering Jarek's points,
> > > > >> >>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>> #1. Yeah, had forgot to write about token mechanism,
> > > added
> > > > >> >>> that
> > > > >> >>>>> in
> > > > >> >>>>>>>>>> doc,
> > > > >> >>>>>>>>>>>> but
> > > > >> >>>>>>>>>>>>> still how the token can be obtained (safely) is
> still
> > > open
> > > > >> >> in
> > > > >> >>>> my
> > > > >> >>>>>>>>>> mind.
> > > > >> >>>>>>>>>>> I
> > > > >> >>>>>>>>>>>>> believe the token used by task executors can be
> > created
> > > > >> >>> outside
> > > > >> >>>>> of
> > > > >> >>>>>>>>>> it
> > > > >> >>>>>>>>>>> as
> > > > >> >>>>>>>>>>>>> well (I may be wrong here).
> > > > >> >>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>> #2. Yeah, we would need something similar for
> > triggerers
> > > > as
> > > > >> >>>> well,
> > > > >> >>>>>>>>>> but
> > > > >> >>>>>>>>>>>> that
> > > > >> >>>>>>>>>>>>> can be done as part of a different AIP
> > > > >> >>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>> #3. Yeah, I also believe the API should work
> largely.
> > > > >> >>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>> #4. Added that in the AIP, that instead of dag_dirs
> we
> > > can
> > > > >> >>> work
> > > > >> >>>>>>>>>> with
> > > > >> >>>>>>>>>>>>> dag_bundles and every dag-processor instance would
> be
> > > > >> >> treated
> > > > >> >>>> as
> > > > >> >>>>> a
> > > > >> >>>>>>>>>> diff
> > > > >> >>>>>>>>>>>>> bundle.
> > > > >> >>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>> Also, added points around callbacks, as these are
> also
> > > > >> >>> fetched
> > > > >> >>>>>>>>>> directly
> > > > >> >>>>>>>>>>>>> from the DB.
> > > > >> >>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>> On Fri, Jul 25, 2025 at 11:58 AM Jarek Potiuk <
> > > > >> >>>> [email protected]>
> > > > >> >>>>>>>>>>> wrote:
> > > > >> >>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>>>> A clarification to this - the dag parser today is
> > > likely
> > > > >> >>> not
> > > > >> >>>>>>>>>>>> protection
> > > > >> >>>>>>>>>>>>>> against a dedicated malicious DAG author, but it
> does
> > > > >> >>> protect
> > > > >> >>>>>>>>>> against
> > > > >> >>>>>>>>>>>>>> casual DB access attempts - the db session is
> blanked
> > > out
> > > > >> >> in
> > > > >> >>>> the
> > > > >> >>>>>>>>>>>> parsing
> > > > >> >>>>>>>>>>>>>> process , as are the env var configs
> > > > >> >>>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>
> > > > >> >>>>>>>>>>>
> > > > >> >>>>>>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/execution_time/supervisor.py#L274-L316
> > > > >> >>>>>>>>>>>>>> -
> > > > >> >>>>>>>>>>>>>> is this perfect no? but it’s much more than no
> > > protection
> > > > >> >>>>>>>>>>>>>> Oh absolutely.. This is exactly what we discussed
> > back
> > > > >> >> then
> > > > >> >>> in
> > > > >> >>>>>>>>>> March
> > > > >> >>>>>>>>>>> I
> > > > >> >>>>>>>>>>>>>> think - and the way we decided to go for 3.0 with
> > full
> > > > >> >>>> knowledge
> > > > >> >>>>>>>>>> it's
> > > > >> >>>>>>>>>>>> not
> > > > >> >>>>>>>>>>>>>> protecting against all threats.
> > > > >> >>>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>>> On Fri, Jul 25, 2025 at 8:22 AM Ash Berlin-Taylor <
> > > > >> >>>>>>>>>> [email protected]>
> > > > >> >>>>>>>>>>>>> wrote:
> > > > >> >>>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>>>> A clarification to this - the dag parser today is
> > > likely
> > > > >> >>> not
> > > > >> >>>>>>>>>>>> protection
> > > > >> >>>>>>>>>>>>>>> against a dedicated malicious DAG author, but it
> > does
> > > > >> >>> protect
> > > > >> >>>>>>>>>>> against
> > > > >> >>>>>>>>>>>>>>> casual DB access attempts - the db session is
> > blanked
> > > > out
> > > > >> >>> in
> > > > >> >>>>>>>>>> the
> > > > >> >>>>>>>>>>>>> parsing
> > > > >> >>>>>>>>>>>>>>> process , as are the env var configs
> > > > >> >>>>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>
> > > > >> >>>>>>>>>>>
> > > > >> >>>>>>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/execution_time/supervisor.py#L274-L316
> > > > >> >>>>>>>>>>>>>>> - is this perfect no? but it’s much more than no
> > > > >> >> protection
> > > > >> >>>>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>>>>> On 24 Jul 2025, at 21:56, Jarek Potiuk <
> > > > >> >> [email protected]>
> > > > >> >>>>>>>>>> wrote:
> > > > >> >>>>>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>>>>> Currently in the DagFile processor there is no
> > > > built-in
> > > > >> >>>>>>>>>>> protection
> > > > >> >>>>>>>>>>>>>>> against
> > > > >> >>>>>>>>>>>>>>>> user code from Dag Parsing to - for example -
> read
> > > > >> >>> database
> > > > >> >>>>>>>>>>>>>>>> credentials from airflow configuration and use
> them
> > > to
> > > > >> >>> talk
> > > > >> >>>>>>>>>> to DB
> > > > >> >>>>>>>>>>>>>>> directly.
> > > > >> >>>>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>
> > > > >> >>>>>>>>>>>
> > > > >> >>>>>>>>>>
> > > > >> >>>>>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>
> > > > >>
> > > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] AIP-92 Isolate DAG parsing logic

Reply via email to