Those are all questions that we will open up for discussion once we get the basic foundation :)
On Wed, Aug 13, 2025 at 10:16 AM Sumit Maheshwari <sumeet.ma...@gmail.com> wrote: > Awesome, having a standard approach for all kinds of authentication would > be great, looking forward to it. > > BTW, on a side note, I see that as of now, things like Connections, > Variables, and XComs which are present under Execution API namespace, don't > have any authorization model (left with TODOs), so is there any plan how > they will work, cause we might need something similar for > Dag-processor/Triggerer as well. > > Also, as we've decided to create a diff API namespace and not use execution > namespace for hosting new APIs required by dag-processor/triggerer, do > we've to copy these classes & routes OR atleast refactor them, so they can > serve all API namespaces. > > On Wed, Aug 13, 2025 at 1:27 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > > Will do. We are also discussing - for now - within the security team - > the > > various aspects of authentication approach we want to have - for both > > "UI/User authentication" as well as "Long running services" and how they > > relate to token exchange, invalidation and other scenarios. What we will > > come up - I hope shortly - is a proposal of general "model" of all kinds > of > > security and authentication scenarios, so that we do not have to reinvent > > the wheel and try to figure out all the aspects with individual AIPs. We > > are not far from bringing it to the open discussion at devlist, but we > have > > to be careful about some of the aspects that we might need to improve in > > the current setup to close some - small - loopholes so bear with us :) > > > > On Wed, Aug 13, 2025 at 9:33 AM Sumit Maheshwari <sumeet.ma...@gmail.com > > > > wrote: > > > > > Thanks Ash and Jarek, for the detailed comments. I largely agree with > the > > > points mentioned by you guys, hence I updated the AIP and added a > section > > > on Authentication, API Versioning, and Packaging as well. Please go > > through > > > it once more and let me know if there are more things to consider > before > > I > > > open it for voting. > > > > > > > > > On Thu, Aug 7, 2025 at 6:03 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > > > Also > > > > > > > > > 1. The Authentication token. How will this long lived token work > > > without > > > > being insecure. Who and what will generate it? How will we identify > > > > top-level requests for Variables in order to be able to add Variable > > > > RBAC/ACLs. This is an important enough thing that I think it needs > > > > discussion before we vote on this AIP. > > > > > > > > We are currently discussing - in the security team - approach for JWT > > > token > > > > handling, so likely we could move the discussion there, it does have > > some > > > > security implications and I think we should bring our finding to the > > > > devlist when we complete it, but I think we should add this case > there. > > > > IMHO we should have a different approach for UI, different for Tasks, > > > > different for Triggerer, and different for DagProcessor. (possibly > the > > > > Trigerer and DagProcessor could be the same because they share > > > essentially > > > > the same long-living token. Ash - I will add this to the discussion > > > there. > > > > > > > > J. > > > > > > > > > > > > > > > > On Thu, Aug 7, 2025 at 2:23 PM Jarek Potiuk <ja...@potiuk.com> > wrote: > > > > > > > > > Ah.. So if we are talking about a more complete approach - seeing > > those > > > > > comments from Ash - make me think if we should have another AIP. > > > > > (connected) about splitting the distributions. We have never > > finalized > > > it > > > > > (nor even discussed id) but Ash - you had some initial document for > > > that. > > > > > So maybe we should finalize it and rather than specify it in this > > AIP - > > > > > have a separate AIP about distribution split that AIP-92 could > depend > > > on. > > > > > It seems much more reasonable to split "distribution and code > split" > > > from > > > > > parsing isolation I think and implement them separately/in > parallel. > > > > > > > > > > Reading Ash comments (and maybe I am going a bit further than Ash) > it > > > > > calls for something that I am a big proponent of - splitting > > > > "airflow-core" > > > > > and having a different "scheduler". "webserver", "dag processor" > and > > > > > "triggerer" distributions. Now - we have the capability of having > > > > "shared" > > > > > code - we do not need "common" code to make it happen - because we > > can > > > > > share code. > > > > > > > > > > What it could give us - on top of clean client/server split, we > could > > > > have > > > > > different dependencies used by those distributions. Additionally, > we > > > > could > > > > > also split-off the executors from providers and finally implement > it > > in > > > > the > > > > > way that scheduler does not use providers at all (not even > > > > cncf.kubernetes > > > > > nor celery providers installed in scheduler nor webserver but > > > "executors" > > > > > packages instead. The code sharing approach with symlinks we have > now > > > > will > > > > > make it a .... breeze :) . That would also imply sharing > "connection" > > > > > definitions through DB, and likely implementing "test connection" > > > > > feature properly finally (i.e executing test connection in worker / > > > > > triggerer rather than in web server which is a reason why we > disabled > > > it > > > > by > > > > > default now). This way "api-server" would not need any of the > > providers > > > > to > > > > > be installed either which IMHO is the biggest win from a security > > point > > > > of > > > > > view. > > > > > > > > > > And the nice thing about it is that it would be rather transparent > > when > > > > > anyone uses "pip install apache-airflow" - it would behave exactly > > the > > > > > same, no more complexity involved, simply more distributions > > installed > > > > when > > > > > 'apache-airflow" meta-distribution is used, but it would allow > those > > > who > > > > > want to implement a more complex and secure setup to have different > > > > > "environments" with modularized pieces of airflow installed - only > > > > > "apache-airflow-dag-processor + task-sdk + providers" where > > > dag-processor > > > > > is run, only "apache-airflow-scheduler + executors" where scheduler > > is > > > > > installed only "apache-airflow-task-sdk + providers" where workers > > are > > > > > running, only "apache-airflow-api-server" where api-server is > running > > > and > > > > > only "apache-airflow-trigger + task-sdk + providers" . > > > > > > > > > > I am happy (Ash If you are fine with that) to take that original > > > document > > > > > over and lead this part and new AIP to completion (including > > > > > implementation), I am very much convinced that this will lead to > much > > > > > better dependency security and more modular code without impacting > > the > > > > > "apache-airflow" installation complexity. > > > > > > > > > > If we do it this way- the part of code/clean split would be > > "delegated > > > > > out" from AIP-92 to this new AIP and turned into dependency. > > > > > > > > > > J. > > > > > > > > > > > > > > > On Thu, Aug 7, 2025 at 1:51 PM Ash Berlin-Taylor <a...@apache.org> > > > wrote: > > > > > > > > > >> This AIP is definitely heading in the right direction and is a > > feature > > > > >> I’d like to see. > > > > >> > > > > >> For me the outstanding things that need more detail: > > > > >> > > > > >> 1. The Authentication token. How will this long lived token work > > > without > > > > >> being insecure. Who and what will generate it? How will we > identify > > > > >> top-level requests for Variables in order to be able to add > Variable > > > > >> RBAC/ACLs. This is an important enough thing that I think it needs > > > > >> discussion before we vote on this AIP. > > > > >> 2. Security generally — how will this work, especially with the > > > > >> multi-team? I think this likely means making the APIs work on the > > > bundle > > > > >> level as you mention in the doc, but I haven’t thought deeply > about > > > this > > > > >> yet. > > > > >> 3. API Versioning? One of the the key driving goals with AIP-72 > and > > > the > > > > >> Task Execution SDK was the idea that “you can upgrade the API > server > > > as > > > > you > > > > >> like, and your clients/workers never need to work” — i.e. the API > > > > server is > > > > >> 100% working with all older versions of the TaskSDK. I don’t know > if > > > we > > > > >> will achieve that goal in the long run but it is the desire, and > > part > > > of > > > > >> why we are using CalVer and the Cadwyn library to provide API > > > > versioning. > > > > >> 4. As mentioned previously, not sure the existing serialised JSON > > > format > > > > >> for DAGs is correct, but since that now has version and we already > > > have > > > > the > > > > >> ability to upgrade that somewhere in the Airflow Core that doesn’t > > > > >> necessarily become a blocker/pre-requisite for this AIP. > > > > >> > > > > >> I think Dag parsing API client+submission+parsing process manager > > > should > > > > >> either live in the Task SDK dist, or in a new separate dist that > > uses > > > > >> TaskSDK, but crucially not in apache-airflow-core. My reason for > > this > > > is > > > > >> that I want it to be possible for the server components > (scheduler, > > > API > > > > >> server) to not need task-sdk installed (just for > > cleanliness/avoiding > > > > >> confusion about what versions it needs) and also vice-verse, to be > > > able > > > > to > > > > >> run a “team worker bundle” (Dag parsing, workers, triggered/async > > > > workers) > > > > >> on whatever version of TaskSDK they choose, again without > > > > >> apache-airflow-core installed for avoidance of doubt. > > > > >> > > > > >> Generally I would like this as it means we can have a nicer > > separation > > > > of > > > > >> Core and Dag parsing code, as the dag parsing itself uses the SDK, > > it > > > > would > > > > >> be nice to have a proper server/client split, both from a tighter > > > > security > > > > >> point-of-view, but also from a code layout point of view. > > > > >> > > > > >> -ash > > > > >> > > > > >> > > > > >> > On 7 Aug 2025, at 12:36, Jarek Potiuk <ja...@potiuk.com> wrote: > > > > >> > > > > > >> > Well, you started it - so it's up to you to decide if you think > we > > > > have > > > > >> > consensus, or whether we need a vote. > > > > >> > > > > > >> > And It's not a question of "informal" vote but it's rather clear > > > > >> following > > > > >> > the https://www.apache.org/foundation/voting.html that we > either > > > > need a > > > > >> > LAZY CONSENSUS or VOTE thread. Both are formal. > > > > >> > > > > > >> > This is the difficult part when you have a proposal, to assess > (by > > > > you) > > > > >> > whether we are converging to consensus or whether vote is > needed. > > > > There > > > > >> is > > > > >> > no other body or "authority" to do it for you. > > > > >> > > > > > >> > J. > > > > >> > > > > > >> > On Thu, Aug 7, 2025 at 1:02 PM Sumit Maheshwari < > > > > sumeet.ma...@gmail.com > > > > >> > > > > > >> > wrote: > > > > >> > > > > > >> >> Sorry for nudging again, but can we get into some consensus on > > > this? > > > > I > > > > >> mean > > > > >> >> if this AIP isn't good enough, then we can drop it altogether > and > > > > >> someone > > > > >> >> can rethink the whole thing. Should we do some kind of informal > > > > voting > > > > >> and > > > > >> >> close this thread? > > > > >> >> > > > > >> >> On Mon, Aug 4, 2025 at 3:32 PM Jarek Potiuk <ja...@potiuk.com> > > > > wrote: > > > > >> >> > > > > >> >>>>> My main concern with this right now is the serialisation > > format > > > of > > > > >> >> DAGs > > > > >> >>> — > > > > >> >>>> it wasn’t really designed with remote submission in mind, so > it > > > > need > > > > >> >> some > > > > >> >>>> careful examination to see if it is fit for this purpose or > > not. > > > > >> >>> > > > > >> >>> I understand Ash's concerns - the format has not been designed > > > with > > > > >> >>> size/speed optimization in mind so **possibly** we could > design > > a > > > > >> >> different > > > > >> >>> format that would be better suited. > > > > >> >>> > > > > >> >>> BUT ... Done is better than perfect. > > > > >> >>> > > > > >> >>> I think there are a number of risks involved in changing the > > > format > > > > >> and > > > > >> >> it > > > > >> >>> could significantly increase time of development with > uncertain > > > > gains > > > > >> at > > > > >> >>> the end - also because of the progress in compression that > > > happened > > > > >> over > > > > >> >>> the last few years. > > > > >> >>> > > > > >> >>> It might be a good idea to experiment a bit with different > > > > compression > > > > >> >>> algorithms for "our" dag representation and possibly we could > > find > > > > the > > > > >> >> best > > > > >> >>> algorithm for "airflow dag" type of json data. There are a lot > > of > > > > >> >>> repetitions in the JSON representation and I guess in "our" > json > > > > >> >>> representation there are some artifacts and repeated sections > > that > > > > >> simply > > > > >> >>> might compress well with different algorithms. Also in this > case > > > > >> >>> speed matters (and CPU trade-off). > > > > >> >>> > > > > >> >>> Looking at compression "theory" - before we experiment with > it - > > > > >> there is > > > > >> >>> the relatively new standard "zstandard" > > > > >> https://github.com/facebook/zstd > > > > >> >>> compression opensourced in 2016 which I've heard good things > > > about - > > > > >> >>> especially that it maintains a very good compression rate for > > text > > > > >> data, > > > > >> >>> but also it is tremendously fast - especially for > decompression > > > > >> (which is > > > > >> >>> super important factor for us - we compress new DAG > > representation > > > > far > > > > >> >> less > > > > >> >>> often than decompress it in general case). It is standardized > in > > > RFC > > > > >> >>> https://datatracker.ietf.org/doc/html/rfc8878 and there are > > > various > > > > >> >>> implementations and it is even being added to Python standard > > > > library > > > > >> in > > > > >> >>> Python 3.14 > > > > >> https://docs.python.org/3.14/library/compression.zstd.html > > > > >> >> and > > > > >> >>> there is a very well maintained python binding library > > > > >> >>> https://pypi.org/project/zstd/ to Yann Collet (algorithm > > author) > > > > >> ZSTD C > > > > >> >>> library. And libzstd is already part of our images - it is > > needed > > > by > > > > >> >> other > > > > >> >>> dependencies of ours. All with BSD licence, directly usable by > > us. > > > > >> >>> > > > > >> >>> I think this one might be a good candidate for us to try, and > > > > possibly > > > > >> >> with > > > > >> >>> zstd we could achieve both size and CPU overhead that would be > > > > >> comparable > > > > >> >>> with any "new" format we could come up with - especially that > we > > > are > > > > >> >>> talking merely about processing a huge blob between "storable" > > > > >> >> (compressed) > > > > >> >>> and "locally usable" state (Python dict). We could likely use > a > > > > >> streaming > > > > >> >>> JSON library (say the one that is used in Pydantic internally > > > > >> >>> https://github.com/pydantic/jiter - we already have it as > part > > of > > > > >> >>> Pydantic) > > > > >> >>> to also save memory - we could stream decompressed stream into > > > > jitter > > > > >> so > > > > >> >>> that both the json dict and string representation does not > have > > to > > > > be > > > > >> >>> loaded fully in memory at the same time. There are likely lots > > of > > > > >> >>> optimisations we could do - I mentioned possibly streaming the > > > data > > > > >> from > > > > >> >>> API directly to DB (if this is possible - not sure) > > > > >> >>> > > > > >> >>> J. > > > > >> >>> > > > > >> >>> > > > > >> >>> On Mon, Aug 4, 2025 at 9:10 AM Sumit Maheshwari < > > > > >> sumeet.ma...@gmail.com> > > > > >> >>> wrote: > > > > >> >>> > > > > >> >>>>> > > > > >> >>>>> My main concern with this right now is the serialisation > > format > > > of > > > > >> >>> DAGs — > > > > >> >>>>> it wasn’t really designed with remote submission in mind, so > > it > > > > need > > > > >> >>> some > > > > >> >>>>> careful examination to see if it is fit for this purpose or > > not. > > > > >> >>>>> > > > > >> >>>> > > > > >> >>>> I'm not sure on this point, cause if we are able to convert a > > DAG > > > > >> into > > > > >> >>>> JSON, then it has to be transferable over the internet. > > > > >> >>>> > > > > >> >>>> In particular One of the things I worry about is that the > JSON > > > can > > > > >> get > > > > >> >>> huge > > > > >> >>>>> — I’ve seem this as large as 10-20Mb for some dags > > > > >> >>>> > > > > >> >>>> > > > > >> >>>> Yeah, agree on this, thats why we can transfer compressed > data > > > > >> instead > > > > >> >> of > > > > >> >>>> real json. Of course, this won't guarantee that the payload > > will > > > > >> always > > > > >> >>> be > > > > >> >>>> small enough, but we can't say that it'll definitely happen > > > either. > > > > >> >>>> > > > > >> >>>> I also wonder if as part of this proposal we should move the > > > > Callback > > > > >> >>>>> requests off the dag parsers and on to the workers instead > > > > >> >>>> > > > > >> >>>> let's make such a "workfload" implementation stream that > could > > > > >> support > > > > >> >>> both > > > > >> >>>>> - Deadlines and DAG parsing logic > > > > >> >>>> > > > > >> >>>> > > > > >> >>>> I don't have any strong opinion here, but it feels like it's > > > gonna > > > > >> blow > > > > >> >>> up > > > > >> >>>> the scope of the AIP too much. > > > > >> >>>> > > > > >> >>>> > > > > >> >>>> On Fri, Aug 1, 2025 at 2:27 AM Jarek Potiuk < > ja...@potiuk.com> > > > > >> wrote: > > > > >> >>>> > > > > >> >>>>>> My main concern with this right now is the serialisation > > format > > > > of > > > > >> >>>> DAGs — > > > > >> >>>>> it wasn’t really designed with remote submission in mind, so > > it > > > > need > > > > >> >>> some > > > > >> >>>>> careful examination to see if it is fit for this purpose or > > not. > > > > >> >>>>> > > > > >> >>>>> Yep. That might be potentially a problem (or at least "need > > more > > > > >> >>>> resources > > > > >> >>>>> to run airflow") and that is where my "2x memory" came from > if > > > we > > > > do > > > > >> >> it > > > > >> >>>> in > > > > >> >>>>> a trivial way. Currently we a) keep the whole DAG in memory > > when > > > > >> >>>>> serializing it b) submit it to database (also using > > essentially > > > > some > > > > >> >>> kind > > > > >> >>>>> of API (implemented by the database client) - so we know the > > > whole > > > > >> >>> thing > > > > >> >>>>> "might work" but indeed if you use a trivial implementation > of > > > > >> >>> submitting > > > > >> >>>>> the whole json - it basically means that the whole json will > > > have > > > > to > > > > >> >>> also > > > > >> >>>>> be kept in the memory of API server. But we also compress it > > > when > > > > >> >>> needed > > > > >> >>>> - > > > > >> >>>>> I wonder what are the compression ratios we saw with those > > > > 10-20MBs > > > > >> >>> Dags > > > > >> >>>> - > > > > >> >>>>> if the problem is using strings where bool would suffice, > > > > >> compression > > > > >> >>>>> should generally help a lot. We could only ever send > > compressed > > > > data > > > > >> >>> over > > > > >> >>>>> the API - there seems to be no need to send "plain JSON" > data > > > over > > > > >> >> the > > > > >> >>>> API > > > > >> >>>>> or storing the plain JSON in the DB (of course that trades > > > memory > > > > >> for > > > > >> >>>> CPU). > > > > >> >>>>> > > > > >> >>>>> I wonder if sqlalchemy 2 (and drivers for MySQL/Postgres) > have > > > > >> >> support > > > > >> >>>> for > > > > >> >>>>> any kind if binary data streaming - because that could help > a > > > lot > > > > of > > > > >> >> if > > > > >> >>>> we > > > > >> >>>>> could use streaming HTTP API and chunk and append the binary > > > > chunks > > > > >> >>> (when > > > > >> >>>>> writing) - or read data in chunks ans stream them back via > the > > > > API. > > > > >> >>> That > > > > >> >>>>> could seriously decrease the amount of memory needed by the > > API > > > > >> >> server > > > > >> >>> to > > > > >> >>>>> process such huge serialized dags. > > > > >> >>>>> > > > > >> >>>>> And yeah - I would also love the "execute task" to be > > > implemented > > > > >> >> here > > > > >> >>> - > > > > >> >>>>> but I am not sure if this should be part of the same effort > or > > > > maybe > > > > >> >> a > > > > >> >>>>> separate implementation? That sounds very loosely coupled > with > > > DB > > > > >> >>>>> isolation. And it seems a common theme - I think that would > > also > > > > >> make > > > > >> >>> the > > > > >> >>>>> sync Deadline alerts case that we discussed at the dev call > > > > today. I > > > > >> >>>> wonder > > > > >> >>>>> if that should not be kind of parallel (let's make such a > > > > >> "workfload" > > > > >> >>>>> implementation stream that could support both - Deadlines > and > > > DAG > > > > >> >>> parsing > > > > >> >>>>> logic. We have already two "users" for it and I really love > > the > > > > >> >> saying > > > > >> >>>> "if > > > > >> >>>>> you want to make something reusable - make it usable > first" - > > > > seems > > > > >> >>> like > > > > >> >>>>> we might have good opportunity to make such workload > > > > implementation > > > > >> >>>> "doubly > > > > >> >>>>> used" from the beginning which would increase chances it > will > > > be > > > > >> >>>>> "reusable" for other things as well :). > > > > >> >>>>> > > > > >> >>>>> J. > > > > >> >>>>> > > > > >> >>>>> > > > > >> >>>>> On Thu, Jul 31, 2025 at 12:28 PM Ash Berlin-Taylor < > > > > a...@apache.org> > > > > >> >>>> wrote: > > > > >> >>>>> > > > > >> >>>>>> My main concern with this right now is the serialisation > > format > > > > of > > > > >> >>>> DAGs — > > > > >> >>>>>> it wasn’t really designed with remote submission in mind, > so > > it > > > > >> >> need > > > > >> >>>> some > > > > >> >>>>>> careful examination to see if it is fit for this purpose or > > > not. > > > > >> >>>>>> > > > > >> >>>>>> In particular One of the things I worry about is that the > > JSON > > > > can > > > > >> >>> get > > > > >> >>>>>> huge — I’ve seem this as large as 10-20Mb for some dags(!!) > > > > (which > > > > >> >> is > > > > >> >>>>>> likely due to things being included as text when a bool > might > > > > >> >>> suffice, > > > > >> >>>>> for > > > > >> >>>>>> example) But I don’t think “just submit the existing JSON > > over > > > an > > > > >> >>> API” > > > > >> >>>>> is a > > > > >> >>>>>> good idea. > > > > >> >>>>>> > > > > >> >>>>>> I also wonder if as part of this proposal we should move > the > > > > >> >> Callback > > > > >> >>>>>> requests off the dag parsers and on to the workers instead > — > > in > > > > >> >>> AIP-72 > > > > >> >>>> we > > > > >> >>>>>> introduced the concept of a Workload, with the only one > > > existing > > > > >> >>> right > > > > >> >>>>> now > > > > >> >>>>>> is “ExecuteTask” > > > > >> >>>>>> > > > > >> >>>>> > > > > >> >>>> > > > > >> >>> > > > > >> >> > > > > >> > > > > > > > > > > https://github.com/apache/airflow/blob/8e1201c7713d5c677fa6f6d48bbd4f6903505f61/airflow-core/src/airflow/executors/workloads.py#L87-L88 > > > > >> >>>>>> — it might be time to finally move task and dag callbacks > to > > > the > > > > >> >> same > > > > >> >>>>> thing > > > > >> >>>>>> and make dag parsers only responsible for, well, parsing. > :) > > > > >> >>>>>> > > > > >> >>>>>> These are all solvable problems, and this will be a great > > > feature > > > > >> >> to > > > > >> >>>>> have, > > > > >> >>>>>> but we need to do some more thinking and planning first. > > > > >> >>>>>> > > > > >> >>>>>> -ash > > > > >> >>>>>> > > > > >> >>>>>>> On 31 Jul 2025, at 10:12, Sumit Maheshwari < > > > > >> >> sumeet.ma...@gmail.com > > > > >> >>>> > > > > >> >>>>>> wrote: > > > > >> >>>>>>> > > > > >> >>>>>>> Gentle reminder for everyone to review the proposal. > > > > >> >>>>>>> > > > > >> >>>>>>> Updated link: > > > > >> >>>>>>> > > > > >> >>>>>> > > > > >> >>>>> > > > > >> >>>> > > > > >> >>> > > > > >> >> > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/%5BWIP%5D+AIP-92+Isolate+DAG+processor%2C+Callback+processor%2C+and+Triggerer+from+core+services > > > > >> >>>>>>> > > > > >> >>>>>>> On Tue, Jul 29, 2025 at 4:37 PM Sumit Maheshwari < > > > > >> >>>>> sumeet.ma...@gmail.com > > > > >> >>>>>>> > > > > >> >>>>>>> wrote: > > > > >> >>>>>>> > > > > >> >>>>>>>> Thanks everyone for reviewing this AIP. As Jarek and > others > > > > >> >>>>> suggested, I > > > > >> >>>>>>>> expanded the scope of this AIP and divided it into three > > > > phases. > > > > >> >>>> With > > > > >> >>>>>> the > > > > >> >>>>>>>> increased scope, the boundary line between this AIP and > > > AIP-85 > > > > >> >>> got a > > > > >> >>>>>> little > > > > >> >>>>>>>> thinner, but I believe these are still two different > > > > >> >> enhancements > > > > >> >>> to > > > > >> >>>>>> make. > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> On Fri, Jul 25, 2025 at 10:51 PM Sumit Maheshwari < > > > > >> >>>>>> sumeet.ma...@gmail.com> > > > > >> >>>>>>>> wrote: > > > > >> >>>>>>>> > > > > >> >>>>>>>>> Yeah, overall it makes sense to include Triggers as well > > to > > > be > > > > >> >>> part > > > > >> >>>>> of > > > > >> >>>>>>>>> this AIP and phase out the implementation. Though I > didn't > > > > >> >>> exclude > > > > >> >>>>>> Triggers > > > > >> >>>>>>>>> because "Uber" doesn't need that, I just thought of > > keeping > > > > the > > > > >> >>>> scope > > > > >> >>>>>> of > > > > >> >>>>>>>>> development small and achieving them, just like it was > > done > > > in > > > > >> >>>>> Airlfow > > > > >> >>>>>> 3 by > > > > >> >>>>>>>>> secluding only workers and not DAG-processor & Triggers. > > > > >> >>>>>>>>> > > > > >> >>>>>>>>> But if you think Triggers should be part of this AIP > > itself, > > > > >> >>> then I > > > > >> >>>>> can > > > > >> >>>>>>>>> do that and include Triggers as well in it. > > > > >> >>>>>>>>> > > > > >> >>>>>>>>> On Fri, Jul 25, 2025 at 7:34 PM Jarek Potiuk < > > > > ja...@potiuk.com > > > > >> >>> > > > > >> >>>>> wrote: > > > > >> >>>>>>>>> > > > > >> >>>>>>>>>> I would very much prefer the architectural choices of > > this > > > > AIP > > > > >> >>> are > > > > >> >>>>>> based > > > > >> >>>>>>>>>> on > > > > >> >>>>>>>>>> "general public" needs rather than "Uber needs" even if > > > Uber > > > > >> >>> will > > > > >> >>>> be > > > > >> >>>>>>>>>> implementing it - so from my point of view having > Trigger > > > > >> >>>> separation > > > > >> >>>>>> as > > > > >> >>>>>>>>>> part of it is quite important. > > > > >> >>>>>>>>>> > > > > >> >>>>>>>>>> But that's not even this. > > > > >> >>>>>>>>>> > > > > >> >>>>>>>>>> We've been discussing for example for Deadlines (being > > > > >> >>> implemented > > > > >> >>>>> by > > > > >> >>>>>>>>>> Dennis and Ramit a possibility of short, > > > notification-style > > > > >> >>>>>> "deadlines" > > > > >> >>>>>>>>>> to be send to triggerer for execution - this is well > > > advanced > > > > >> >>> now, > > > > >> >>>>> and > > > > >> >>>>>>>>>> whether you want it or not Dag-provided code might be > > > > >> >> serialized > > > > >> >>>> and > > > > >> >>>>>> sent > > > > >> >>>>>>>>>> to triggerer for execution. This is part of our > "broader" > > > > >> >>>>>> architectural > > > > >> >>>>>>>>>> change where we treat "workers" and "triggerer" > similarly > > > as > > > > a > > > > >> >>>>> general > > > > >> >>>>>>>>>> executors of "sync" and "async" tasks respectively. > > That's > > > > >> >> where > > > > >> >>>>>> Airflow > > > > >> >>>>>>>>>> is > > > > >> >>>>>>>>>> evolving towards - inevitably. > > > > >> >>>>>>>>>> > > > > >> >>>>>>>>>> But we can of course phase things in out for > > > implementation - > > > > >> >>> even > > > > >> >>>>> if > > > > >> >>>>>> AIP > > > > >> >>>>>>>>>> should cover both, I think if the goal of the AIP and > > > > preamble > > > > >> >>> is > > > > >> >>>>>> about > > > > >> >>>>>>>>>> separating "user code" from "database" as the main > > reason, > > > it > > > > >> >>> also > > > > >> >>>>>> means > > > > >> >>>>>>>>>> Triggerer if you ask me (from design point of view at > > > least). > > > > >> >>>>>>>>>> > > > > >> >>>>>>>>>> Again implementation can be phased and even different > > > people > > > > >> >> and > > > > >> >>>>> teams > > > > >> >>>>>>>>>> might work on those phases/pieces. > > > > >> >>>>>>>>>> > > > > >> >>>>>>>>>> J. > > > > >> >>>>>>>>>> > > > > >> >>>>>>>>>> On Fri, Jul 25, 2025 at 2:29 PM Sumit Maheshwari < > > > > >> >>>>>> sumeet.ma...@gmail.com > > > > >> >>>>>>>>>>> > > > > >> >>>>>>>>>> wrote: > > > > >> >>>>>>>>>> > > > > >> >>>>>>>>>>>> > > > > >> >>>>>>>>>>>>> #2. Yeah, we would need something similar for > > triggerers > > > > as > > > > >> >>>> well, > > > > >> >>>>>>>>>> but > > > > >> >>>>>>>>>>>> that > > > > >> >>>>>>>>>>>> can be done as part of a different AIP > > > > >> >>>>>>>>>>> > > > > >> >>>>>>>>>>> > > > > >> >>>>>>>>>>> You won't achieve your goal of "true" isolation of > user > > > code > > > > >> >> if > > > > >> >>>> you > > > > >> >>>>>>>>>> don't > > > > >> >>>>>>>>>>>> do triggerer. I think if the goal is to achieve it - > it > > > > >> >> should > > > > >> >>>>> cover > > > > >> >>>>>>>>>>> both. > > > > >> >>>>>>>>>>> > > > > >> >>>>>>>>>>> > > > > >> >>>>>>>>>>> My bad, should've explained our architecture for > > triggers > > > as > > > > >> >>>> well, > > > > >> >>>>>>>>>>> apologies. So here it is: > > > > >> >>>>>>>>>>> > > > > >> >>>>>>>>>>> > > > > >> >>>>>>>>>>> - Triggers would be running on a centralized service, > > so > > > > >> >> all > > > > >> >>>> the > > > > >> >>>>>>>>>> Trigger > > > > >> >>>>>>>>>>> classes will be part of the platform team's repo and > > not > > > > >> >> the > > > > >> >>>>>>>>>> customer's > > > > >> >>>>>>>>>>> repo > > > > >> >>>>>>>>>>> - The triggers won't be able to use any libs other > than > > > std > > > > >> >>>> ones, > > > > >> >>>>>>>>>> which > > > > >> >>>>>>>>>>> are being used in core Airflow (like requests, etc) > > > > >> >>>>>>>>>>> - As we are the owners of the core Airflow repo, > > > customers > > > > >> >>> have > > > > >> >>>>> to > > > > >> >>>>>>>>>> get > > > > >> >>>>>>>>>>> our approval to land any class in this path (unlike > the > > > > >> >> dags > > > > >> >>>> repo > > > > >> >>>>>>>>>> which > > > > >> >>>>>>>>>>> they own) > > > > >> >>>>>>>>>>> - When a customer's task defer, we would have an > > > allowlist > > > > >> >> on > > > > >> >>>> our > > > > >> >>>>>>>>>> side > > > > >> >>>>>>>>>>> to check if we should do the async polling or not > > > > >> >>>>>>>>>>> - If the Trigger class isn't part of our repo > > > (allowlist), > > > > >> >>> just > > > > >> >>>>>>>>>> fail the > > > > >> >>>>>>>>>>> task, as anyway we won't be having the code that they > > > used > > > > >> >> in > > > > >> >>>> the > > > > >> >>>>>>>>>>> trigger > > > > >> >>>>>>>>>>> class > > > > >> >>>>>>>>>>> - If any of these conditions aren't suitable for you > > (as > > > a > > > > >> >>>>>>>>>> customer), > > > > >> >>>>>>>>>>> feel free to use sync tasks only > > > > >> >>>>>>>>>>> > > > > >> >>>>>>>>>>> > > > > >> >>>>>>>>>>> But in general, I agree to make triggerer svc also > > > > >> >> communicate > > > > >> >>>> over > > > > >> >>>>>>>>>> apis > > > > >> >>>>>>>>>>> only. If that is done, then we can have instances of > > > > >> >> triggerer > > > > >> >>>> svc > > > > >> >>>>>>>>>> running > > > > >> >>>>>>>>>>> at customer's side as well, which can process any type > > of > > > > >> >>> trigger > > > > >> >>>>>>>>>> class. > > > > >> >>>>>>>>>>> Though that's not a blocker for us at the moment, > cause > > > > >> >>> triggerer > > > > >> >>>>> are > > > > >> >>>>>>>>>>> mostly doing just polling using simple libs like > > requests. > > > > >> >>>>>>>>>>> > > > > >> >>>>>>>>>>> > > > > >> >>>>>>>>>>> > > > > >> >>>>>>>>>>> On Fri, Jul 25, 2025 at 5:03 PM Igor Kholopov > > > > >> >>>>>>>>>> <ikholo...@google.com.invalid > > > > >> >>>>>>>>>>>> > > > > >> >>>>>>>>>>> wrote: > > > > >> >>>>>>>>>>> > > > > >> >>>>>>>>>>>> Thanks Sumit for the detailed proposal. Overall I > > believe > > > > it > > > > >> >>>>> aligns > > > > >> >>>>>>>>>> well > > > > >> >>>>>>>>>>>> with the goals of making Airflow well-scalable > beyond a > > > > >> >>>>> single-team > > > > >> >>>>>>>>>>>> deployment (and AIP-85 goals), so you have my full > > > support > > > > >> >>> with > > > > >> >>>>> this > > > > >> >>>>>>>>>> one. > > > > >> >>>>>>>>>>>> > > > > >> >>>>>>>>>>>> I've left a couple of clarification requests on the > AIP > > > > >> >> page. > > > > >> >>>>>>>>>>>> > > > > >> >>>>>>>>>>>> Thanks, > > > > >> >>>>>>>>>>>> Igor > > > > >> >>>>>>>>>>>> > > > > >> >>>>>>>>>>>> On Fri, Jul 25, 2025 at 11:50 AM Sumit Maheshwari < > > > > >> >>>>>>>>>>> sumeet.ma...@gmail.com> > > > > >> >>>>>>>>>>>> wrote: > > > > >> >>>>>>>>>>>> > > > > >> >>>>>>>>>>>>> Thanks Jarek and Ash, for the initial review. It's > > good > > > to > > > > >> >>> know > > > > >> >>>>>>>>>> that > > > > >> >>>>>>>>>>> the > > > > >> >>>>>>>>>>>>> DAG processor has some preemptive measures in place > to > > > > >> >>> prevent > > > > >> >>>>>>>>>> access > > > > >> >>>>>>>>>>>>> to the DB. However, the main issue we are trying to > > > solve > > > > >> >> is > > > > >> >>>> not > > > > >> >>>>> to > > > > >> >>>>>>>>>>>> provide > > > > >> >>>>>>>>>>>>> DB creds to the customer teams, who are using > Airflow > > > as a > > > > >> >>>>>>>>>> multi-tenant > > > > >> >>>>>>>>>>>>> orchestration platform. I've updated the doc to > > reflect > > > > >> >> this > > > > >> >>>>> point > > > > >> >>>>>>>>>> as > > > > >> >>>>>>>>>>>> well. > > > > >> >>>>>>>>>>>>> > > > > >> >>>>>>>>>>>>> Answering Jarek's points, > > > > >> >>>>>>>>>>>>> > > > > >> >>>>>>>>>>>>> #1. Yeah, had forgot to write about token mechanism, > > > added > > > > >> >>> that > > > > >> >>>>> in > > > > >> >>>>>>>>>> doc, > > > > >> >>>>>>>>>>>> but > > > > >> >>>>>>>>>>>>> still how the token can be obtained (safely) is > still > > > open > > > > >> >> in > > > > >> >>>> my > > > > >> >>>>>>>>>> mind. > > > > >> >>>>>>>>>>> I > > > > >> >>>>>>>>>>>>> believe the token used by task executors can be > > created > > > > >> >>> outside > > > > >> >>>>> of > > > > >> >>>>>>>>>> it > > > > >> >>>>>>>>>>> as > > > > >> >>>>>>>>>>>>> well (I may be wrong here). > > > > >> >>>>>>>>>>>>> > > > > >> >>>>>>>>>>>>> #2. Yeah, we would need something similar for > > triggerers > > > > as > > > > >> >>>> well, > > > > >> >>>>>>>>>> but > > > > >> >>>>>>>>>>>> that > > > > >> >>>>>>>>>>>>> can be done as part of a different AIP > > > > >> >>>>>>>>>>>>> > > > > >> >>>>>>>>>>>>> #3. Yeah, I also believe the API should work > largely. > > > > >> >>>>>>>>>>>>> > > > > >> >>>>>>>>>>>>> #4. Added that in the AIP, that instead of dag_dirs > we > > > can > > > > >> >>> work > > > > >> >>>>>>>>>> with > > > > >> >>>>>>>>>>>>> dag_bundles and every dag-processor instance would > be > > > > >> >> treated > > > > >> >>>> as > > > > >> >>>>> a > > > > >> >>>>>>>>>> diff > > > > >> >>>>>>>>>>>>> bundle. > > > > >> >>>>>>>>>>>>> > > > > >> >>>>>>>>>>>>> Also, added points around callbacks, as these are > also > > > > >> >>> fetched > > > > >> >>>>>>>>>> directly > > > > >> >>>>>>>>>>>>> from the DB. > > > > >> >>>>>>>>>>>>> > > > > >> >>>>>>>>>>>>> On Fri, Jul 25, 2025 at 11:58 AM Jarek Potiuk < > > > > >> >>>> ja...@potiuk.com> > > > > >> >>>>>>>>>>> wrote: > > > > >> >>>>>>>>>>>>> > > > > >> >>>>>>>>>>>>>>> A clarification to this - the dag parser today is > > > likely > > > > >> >>> not > > > > >> >>>>>>>>>>>> protection > > > > >> >>>>>>>>>>>>>> against a dedicated malicious DAG author, but it > does > > > > >> >>> protect > > > > >> >>>>>>>>>> against > > > > >> >>>>>>>>>>>>>> casual DB access attempts - the db session is > blanked > > > out > > > > >> >> in > > > > >> >>>> the > > > > >> >>>>>>>>>>>> parsing > > > > >> >>>>>>>>>>>>>> process , as are the env var configs > > > > >> >>>>>>>>>>>>>> > > > > >> >>>>>>>>>>>>>> > > > > >> >>>>>>>>>>>>> > > > > >> >>>>>>>>>>>> > > > > >> >>>>>>>>>>> > > > > >> >>>>>>>>>> > > > > >> >>>>>> > > > > >> >>>>> > > > > >> >>>> > > > > >> >>> > > > > >> >> > > > > >> > > > > > > > > > > https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/execution_time/supervisor.py#L274-L316 > > > > >> >>>>>>>>>>>>>> - > > > > >> >>>>>>>>>>>>>> is this perfect no? but it’s much more than no > > > protection > > > > >> >>>>>>>>>>>>>> Oh absolutely.. This is exactly what we discussed > > back > > > > >> >> then > > > > >> >>> in > > > > >> >>>>>>>>>> March > > > > >> >>>>>>>>>>> I > > > > >> >>>>>>>>>>>>>> think - and the way we decided to go for 3.0 with > > full > > > > >> >>>> knowledge > > > > >> >>>>>>>>>> it's > > > > >> >>>>>>>>>>>> not > > > > >> >>>>>>>>>>>>>> protecting against all threats. > > > > >> >>>>>>>>>>>>>> > > > > >> >>>>>>>>>>>>>> On Fri, Jul 25, 2025 at 8:22 AM Ash Berlin-Taylor < > > > > >> >>>>>>>>>> a...@apache.org> > > > > >> >>>>>>>>>>>>> wrote: > > > > >> >>>>>>>>>>>>>> > > > > >> >>>>>>>>>>>>>>> A clarification to this - the dag parser today is > > > likely > > > > >> >>> not > > > > >> >>>>>>>>>>>> protection > > > > >> >>>>>>>>>>>>>>> against a dedicated malicious DAG author, but it > > does > > > > >> >>> protect > > > > >> >>>>>>>>>>> against > > > > >> >>>>>>>>>>>>>>> casual DB access attempts - the db session is > > blanked > > > > out > > > > >> >>> in > > > > >> >>>>>>>>>> the > > > > >> >>>>>>>>>>>>> parsing > > > > >> >>>>>>>>>>>>>>> process , as are the env var configs > > > > >> >>>>>>>>>>>>>>> > > > > >> >>>>>>>>>>>>>> > > > > >> >>>>>>>>>>>>> > > > > >> >>>>>>>>>>>> > > > > >> >>>>>>>>>>> > > > > >> >>>>>>>>>> > > > > >> >>>>>> > > > > >> >>>>> > > > > >> >>>> > > > > >> >>> > > > > >> >> > > > > >> > > > > > > > > > > https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/execution_time/supervisor.py#L274-L316 > > > > >> >>>>>>>>>>>>>>> - is this perfect no? but it’s much more than no > > > > >> >> protection > > > > >> >>>>>>>>>>>>>>> > > > > >> >>>>>>>>>>>>>>>> On 24 Jul 2025, at 21:56, Jarek Potiuk < > > > > >> >> ja...@potiuk.com> > > > > >> >>>>>>>>>> wrote: > > > > >> >>>>>>>>>>>>>>>> > > > > >> >>>>>>>>>>>>>>>> Currently in the DagFile processor there is no > > > > built-in > > > > >> >>>>>>>>>>> protection > > > > >> >>>>>>>>>>>>>>> against > > > > >> >>>>>>>>>>>>>>>> user code from Dag Parsing to - for example - > read > > > > >> >>> database > > > > >> >>>>>>>>>>>>>>>> credentials from airflow configuration and use > them > > > to > > > > >> >>> talk > > > > >> >>>>>>>>>> to DB > > > > >> >>>>>>>>>>>>>>> directly. > > > > >> >>>>>>>>>>>>>>> > > > > >> >>>>>>>>>>>>>> > > > > >> >>>>>>>>>>>>> > > > > >> >>>>>>>>>>>> > > > > >> >>>>>>>>>>> > > > > >> >>>>>>>>>> > > > > >> >>>>>>>>> > > > > >> >>>>>> > > > > >> >>>>>> > > > > >> >>>>> > > > > >> >>>> > > > > >> >>> > > > > >> >> > > > > >> > > > > >> > > > > > > > > > >