Gentle reminder for everyone to review the proposal. Updated link: https://cwiki.apache.org/confluence/display/AIRFLOW/%5BWIP%5D+AIP-92+Isolate+DAG+processor%2C+Callback+processor%2C+and+Triggerer+from+core+services
On Tue, Jul 29, 2025 at 4:37 PM Sumit Maheshwari <sumeet.ma...@gmail.com> wrote: > Thanks everyone for reviewing this AIP. As Jarek and others suggested, I > expanded the scope of this AIP and divided it into three phases. With the > increased scope, the boundary line between this AIP and AIP-85 got a little > thinner, but I believe these are still two different enhancements to make. > > > > On Fri, Jul 25, 2025 at 10:51 PM Sumit Maheshwari <sumeet.ma...@gmail.com> > wrote: > >> Yeah, overall it makes sense to include Triggers as well to be part of >> this AIP and phase out the implementation. Though I didn't exclude Triggers >> because "Uber" doesn't need that, I just thought of keeping the scope of >> development small and achieving them, just like it was done in Airlfow 3 by >> secluding only workers and not DAG-processor & Triggers. >> >> But if you think Triggers should be part of this AIP itself, then I can >> do that and include Triggers as well in it. >> >> On Fri, Jul 25, 2025 at 7:34 PM Jarek Potiuk <ja...@potiuk.com> wrote: >> >>> I would very much prefer the architectural choices of this AIP are based >>> on >>> "general public" needs rather than "Uber needs" even if Uber will be >>> implementing it - so from my point of view having Trigger separation as >>> part of it is quite important. >>> >>> But that's not even this. >>> >>> We've been discussing for example for Deadlines (being implemented by >>> Dennis and Ramit a possibility of short, notification-style "deadlines" >>> to be send to triggerer for execution - this is well advanced now, and >>> whether you want it or not Dag-provided code might be serialized and sent >>> to triggerer for execution. This is part of our "broader" architectural >>> change where we treat "workers" and "triggerer" similarly as a general >>> executors of "sync" and "async" tasks respectively. That's where Airflow >>> is >>> evolving towards - inevitably. >>> >>> But we can of course phase things in out for implementation - even if AIP >>> should cover both, I think if the goal of the AIP and preamble is about >>> separating "user code" from "database" as the main reason, it also means >>> Triggerer if you ask me (from design point of view at least). >>> >>> Again implementation can be phased and even different people and teams >>> might work on those phases/pieces. >>> >>> J. >>> >>> On Fri, Jul 25, 2025 at 2:29 PM Sumit Maheshwari <sumeet.ma...@gmail.com >>> > >>> wrote: >>> >>> > > >>> > > > #2. Yeah, we would need something similar for triggerers as well, >>> but >>> > > that >>> > > can be done as part of a different AIP >>> > >>> > >>> > You won't achieve your goal of "true" isolation of user code if you >>> don't >>> > > do triggerer. I think if the goal is to achieve it - it should cover >>> > both. >>> > >>> > >>> > My bad, should've explained our architecture for triggers as well, >>> > apologies. So here it is: >>> > >>> > >>> > - Triggers would be running on a centralized service, so all the >>> Trigger >>> > classes will be part of the platform team's repo and not the >>> customer's >>> > repo >>> > - The triggers won't be able to use any libs other than std ones, >>> which >>> > are being used in core Airflow (like requests, etc) >>> > - As we are the owners of the core Airflow repo, customers have to >>> get >>> > our approval to land any class in this path (unlike the dags repo >>> which >>> > they own) >>> > - When a customer's task defer, we would have an allowlist on our >>> side >>> > to check if we should do the async polling or not >>> > - If the Trigger class isn't part of our repo (allowlist), just >>> fail the >>> > task, as anyway we won't be having the code that they used in the >>> > trigger >>> > class >>> > - If any of these conditions aren't suitable for you (as a >>> customer), >>> > feel free to use sync tasks only >>> > >>> > >>> > But in general, I agree to make triggerer svc also communicate over >>> apis >>> > only. If that is done, then we can have instances of triggerer svc >>> running >>> > at customer's side as well, which can process any type of trigger >>> class. >>> > Though that's not a blocker for us at the moment, cause triggerer are >>> > mostly doing just polling using simple libs like requests. >>> > >>> > >>> > >>> > On Fri, Jul 25, 2025 at 5:03 PM Igor Kholopov >>> <ikholo...@google.com.invalid >>> > > >>> > wrote: >>> > >>> > > Thanks Sumit for the detailed proposal. Overall I believe it aligns >>> well >>> > > with the goals of making Airflow well-scalable beyond a single-team >>> > > deployment (and AIP-85 goals), so you have my full support with this >>> one. >>> > > >>> > > I've left a couple of clarification requests on the AIP page. >>> > > >>> > > Thanks, >>> > > Igor >>> > > >>> > > On Fri, Jul 25, 2025 at 11:50 AM Sumit Maheshwari < >>> > sumeet.ma...@gmail.com> >>> > > wrote: >>> > > >>> > > > Thanks Jarek and Ash, for the initial review. It's good to know >>> that >>> > the >>> > > > DAG processor has some preemptive measures in place to prevent >>> access >>> > > > to the DB. However, the main issue we are trying to solve is not to >>> > > provide >>> > > > DB creds to the customer teams, who are using Airflow as a >>> multi-tenant >>> > > > orchestration platform. I've updated the doc to reflect this point >>> as >>> > > well. >>> > > > >>> > > > Answering Jarek's points, >>> > > > >>> > > > #1. Yeah, had forgot to write about token mechanism, added that in >>> doc, >>> > > but >>> > > > still how the token can be obtained (safely) is still open in my >>> mind. >>> > I >>> > > > believe the token used by task executors can be created outside of >>> it >>> > as >>> > > > well (I may be wrong here). >>> > > > >>> > > > #2. Yeah, we would need something similar for triggerers as well, >>> but >>> > > that >>> > > > can be done as part of a different AIP >>> > > > >>> > > > #3. Yeah, I also believe the API should work largely. >>> > > > >>> > > > #4. Added that in the AIP, that instead of dag_dirs we can work >>> with >>> > > > dag_bundles and every dag-processor instance would be treated as a >>> diff >>> > > > bundle. >>> > > > >>> > > > Also, added points around callbacks, as these are also fetched >>> directly >>> > > > from the DB. >>> > > > >>> > > > On Fri, Jul 25, 2025 at 11:58 AM Jarek Potiuk <ja...@potiuk.com> >>> > wrote: >>> > > > >>> > > > > > A clarification to this - the dag parser today is likely not >>> > > protection >>> > > > > against a dedicated malicious DAG author, but it does protect >>> against >>> > > > > casual DB access attempts - the db session is blanked out in the >>> > > parsing >>> > > > > process , as are the env var configs >>> > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/execution_time/supervisor.py#L274-L316 >>> > > > > - >>> > > > > is this perfect no? but it’s much more than no protection >>> > > > > Oh absolutely.. This is exactly what we discussed back then in >>> March >>> > I >>> > > > > think - and the way we decided to go for 3.0 with full knowledge >>> it's >>> > > not >>> > > > > protecting against all threats. >>> > > > > >>> > > > > On Fri, Jul 25, 2025 at 8:22 AM Ash Berlin-Taylor < >>> a...@apache.org> >>> > > > wrote: >>> > > > > >>> > > > > > A clarification to this - the dag parser today is likely not >>> > > protection >>> > > > > > against a dedicated malicious DAG author, but it does protect >>> > against >>> > > > > > casual DB access attempts - the db session is blanked out in >>> the >>> > > > parsing >>> > > > > > process , as are the env var configs >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/execution_time/supervisor.py#L274-L316 >>> > > > > > - is this perfect no? but it’s much more than no protection >>> > > > > > >>> > > > > > > On 24 Jul 2025, at 21:56, Jarek Potiuk <ja...@potiuk.com> >>> wrote: >>> > > > > > > >>> > > > > > > Currently in the DagFile processor there is no built-in >>> > protection >>> > > > > > against >>> > > > > > > user code from Dag Parsing to - for example - read database >>> > > > > > > credentials from airflow configuration and use them to talk >>> to DB >>> > > > > > directly. >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> >>