Gentle reminder for everyone to review the proposal.

Updated link:
https://cwiki.apache.org/confluence/display/AIRFLOW/%5BWIP%5D+AIP-92+Isolate+DAG+processor%2C+Callback+processor%2C+and+Triggerer+from+core+services

On Tue, Jul 29, 2025 at 4:37 PM Sumit Maheshwari <sumeet.ma...@gmail.com>
wrote:

> Thanks everyone for reviewing this AIP. As Jarek and others suggested, I
> expanded the scope of this AIP and divided it into three phases. With the
> increased scope, the boundary line between this AIP and AIP-85 got a little
> thinner, but I believe these are still two different enhancements to make.
>
>
>
> On Fri, Jul 25, 2025 at 10:51 PM Sumit Maheshwari <sumeet.ma...@gmail.com>
> wrote:
>
>> Yeah, overall it makes sense to include Triggers as well to be part of
>> this AIP and phase out the implementation. Though I didn't exclude Triggers
>> because "Uber" doesn't need that, I just thought of keeping the scope of
>> development small and achieving them, just like it was done in Airlfow 3 by
>> secluding only workers and not DAG-processor & Triggers.
>>
>> But if you think Triggers should be part of this AIP itself, then I can
>> do that and include Triggers as well in it.
>>
>> On Fri, Jul 25, 2025 at 7:34 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> I would very much prefer the architectural choices of this AIP are based
>>> on
>>> "general public" needs rather than "Uber needs" even if Uber will be
>>> implementing it - so from my point of view having Trigger separation as
>>> part of it is quite important.
>>>
>>> But that's not even this.
>>>
>>> We've been discussing for example for Deadlines (being implemented by
>>> Dennis and Ramit   a possibility of short, notification-style "deadlines"
>>> to be send to triggerer for execution - this is well advanced now, and
>>> whether you want it or not Dag-provided code might be serialized and sent
>>> to triggerer for execution. This is part of our "broader" architectural
>>> change where we treat "workers" and "triggerer" similarly as a general
>>> executors of "sync" and "async" tasks respectively. That's where Airflow
>>> is
>>> evolving towards - inevitably.
>>>
>>> But we can of course phase things in out for implementation - even if AIP
>>> should cover both, I think if the goal of the AIP and preamble is about
>>> separating "user code" from "database" as the main reason, it also means
>>> Triggerer if you ask me (from design point of view at least).
>>>
>>> Again implementation can be phased and even different people and teams
>>> might work on those phases/pieces.
>>>
>>> J.
>>>
>>> On Fri, Jul 25, 2025 at 2:29 PM Sumit Maheshwari <sumeet.ma...@gmail.com
>>> >
>>> wrote:
>>>
>>> > >
>>> > > > #2. Yeah, we would need something similar for triggerers as well,
>>> but
>>> > > that
>>> > > can be done as part of a different AIP
>>> >
>>> >
>>> > You won't achieve your goal of "true" isolation of user code if you
>>> don't
>>> > > do triggerer. I think if the goal is to achieve it - it should cover
>>> > both.
>>> >
>>> >
>>> > My bad, should've explained our architecture for triggers as well,
>>> > apologies. So here it is:
>>> >
>>> >
>>> >    - Triggers would be running on a centralized service, so all the
>>> Trigger
>>> >    classes will be part of the platform team's repo and not the
>>> customer's
>>> > repo
>>> >    - The triggers won't be able to use any libs other than std ones,
>>> which
>>> >    are being used in core Airflow (like requests, etc)
>>> >    - As we are the owners of the core Airflow repo, customers have to
>>> get
>>> >    our approval to land any class in this path (unlike the dags repo
>>> which
>>> >    they own)
>>> >    - When a customer's task defer, we would have an allowlist on our
>>> side
>>> >    to check if we should do the async polling or not
>>> >    - If the Trigger class isn't part of our repo (allowlist), just
>>> fail the
>>> >    task, as anyway we won't be having the code that they used in the
>>> > trigger
>>> >    class
>>> >    - If any of these conditions aren't suitable for you (as a
>>> customer),
>>> >    feel free to use sync tasks only
>>> >
>>> >
>>> > But in general, I agree to make triggerer svc also communicate over
>>> apis
>>> > only. If that is done, then we can have instances of triggerer svc
>>> running
>>> > at customer's side as well, which can process any type of trigger
>>> class.
>>> > Though that's not a blocker for us at the moment, cause triggerer are
>>> > mostly doing just polling using simple libs like requests.
>>> >
>>> >
>>> >
>>> > On Fri, Jul 25, 2025 at 5:03 PM Igor Kholopov
>>> <ikholo...@google.com.invalid
>>> > >
>>> > wrote:
>>> >
>>> > > Thanks Sumit for the detailed proposal. Overall I believe it aligns
>>> well
>>> > > with the goals of making Airflow well-scalable beyond a single-team
>>> > > deployment (and AIP-85 goals), so you have my full support with this
>>> one.
>>> > >
>>> > > I've left a couple of clarification requests on the AIP page.
>>> > >
>>> > > Thanks,
>>> > > Igor
>>> > >
>>> > > On Fri, Jul 25, 2025 at 11:50 AM Sumit Maheshwari <
>>> > sumeet.ma...@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > Thanks Jarek and Ash, for the initial review. It's good to know
>>> that
>>> > the
>>> > > > DAG processor has some preemptive measures in place to prevent
>>> access
>>> > > > to the DB. However, the main issue we are trying to solve is not to
>>> > > provide
>>> > > > DB creds to the customer teams, who are using Airflow as a
>>> multi-tenant
>>> > > > orchestration platform. I've updated the doc to reflect this point
>>> as
>>> > > well.
>>> > > >
>>> > > > Answering Jarek's points,
>>> > > >
>>> > > > #1. Yeah, had forgot to write about token mechanism, added that in
>>> doc,
>>> > > but
>>> > > > still how the token can be obtained (safely) is still open in my
>>> mind.
>>> > I
>>> > > > believe the token used by task executors can be created outside of
>>> it
>>> > as
>>> > > > well (I may be wrong here).
>>> > > >
>>> > > > #2. Yeah, we would need something similar for triggerers as well,
>>> but
>>> > > that
>>> > > > can be done as part of a different AIP
>>> > > >
>>> > > > #3. Yeah, I also believe the API should work largely.
>>> > > >
>>> > > > #4. Added that in the AIP, that instead of dag_dirs we can work
>>> with
>>> > > > dag_bundles and every dag-processor instance would be treated as a
>>> diff
>>> > > > bundle.
>>> > > >
>>> > > > Also, added points around callbacks, as these are also fetched
>>> directly
>>> > > > from the DB.
>>> > > >
>>> > > > On Fri, Jul 25, 2025 at 11:58 AM Jarek Potiuk <ja...@potiuk.com>
>>> > wrote:
>>> > > >
>>> > > > > > A clarification to this - the dag parser today is likely not
>>> > > protection
>>> > > > > against a dedicated malicious DAG author, but it does protect
>>> against
>>> > > > > casual DB access attempts - the db session is blanked out in the
>>> > > parsing
>>> > > > > process , as are the env var configs
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/execution_time/supervisor.py#L274-L316
>>> > > > > -
>>> > > > > is this perfect no? but it’s much more than no protection
>>> > > > > Oh absolutely.. This is exactly what we discussed back then in
>>> March
>>> > I
>>> > > > > think - and the way we decided to go for 3.0 with full knowledge
>>> it's
>>> > > not
>>> > > > > protecting against all threats.
>>> > > > >
>>> > > > > On Fri, Jul 25, 2025 at 8:22 AM Ash Berlin-Taylor <
>>> a...@apache.org>
>>> > > > wrote:
>>> > > > >
>>> > > > > > A clarification to this - the dag parser today is likely not
>>> > > protection
>>> > > > > > against a dedicated malicious DAG author, but it does protect
>>> > against
>>> > > > > > casual DB access attempts - the db session is blanked out in
>>> the
>>> > > > parsing
>>> > > > > > process , as are the env var configs
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/execution_time/supervisor.py#L274-L316
>>> > > > > > - is this perfect no? but it’s much more than no protection
>>> > > > > >
>>> > > > > > > On 24 Jul 2025, at 21:56, Jarek Potiuk <ja...@potiuk.com>
>>> wrote:
>>> > > > > > >
>>> > > > > > > Currently in the DagFile processor there is no  built-in
>>> > protection
>>> > > > > > against
>>> > > > > > > user code from Dag Parsing to - for example - read database
>>> > > > > > > credentials from airflow configuration and use them to talk
>>> to DB
>>> > > > > > directly.
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>

Reply via email to