Re: [DISCUSS] AIP-92 Isolate DAG parsing logic

Sumit Maheshwari Fri, 25 Jul 2025 05:29:50 -0700

>
> > #2. Yeah, we would need something similar for triggerers as well, but
> that
> can be done as part of a different AIP



You won't achieve your goal of "true" isolation of user code if you don't
> do triggerer. I think if the goal is to achieve it - it should cover both.


My bad, should've explained our architecture for triggers as well,
apologies. So here it is:


   - Triggers would be running on a centralized service, so all the Trigger
   classes will be part of the platform team's repo and not the customer's repo
   - The triggers won't be able to use any libs other than std ones, which
   are being used in core Airflow (like requests, etc)
   - As we are the owners of the core Airflow repo, customers have to get
   our approval to land any class in this path (unlike the dags repo which
   they own)
   - When a customer's task defer, we would have an allowlist on our side
   to check if we should do the async polling or not
   - If the Trigger class isn't part of our repo (allowlist), just fail the
   task, as anyway we won't be having the code that they used in the trigger
   class
   - If any of these conditions aren't suitable for you (as a customer),
   feel free to use sync tasks only


But in general, I agree to make triggerer svc also communicate over apis
only. If that is done, then we can have instances of triggerer svc running
at customer's side as well, which can process any type of trigger class.
Though that's not a blocker for us at the moment, cause triggerer are
mostly doing just polling using simple libs like requests.



On Fri, Jul 25, 2025 at 5:03 PM Igor Kholopov <ikholo...@google.com.invalid>
wrote:

> Thanks Sumit for the detailed proposal. Overall I believe it aligns well
> with the goals of making Airflow well-scalable beyond a single-team
> deployment (and AIP-85 goals), so you have my full support with this one.
>
> I've left a couple of clarification requests on the AIP page.
>
> Thanks,
> Igor
>
> On Fri, Jul 25, 2025 at 11:50 AM Sumit Maheshwari <sumeet.ma...@gmail.com>
> wrote:
>
> > Thanks Jarek and Ash, for the initial review. It's good to know that the
> > DAG processor has some preemptive measures in place to prevent access
> > to the DB. However, the main issue we are trying to solve is not to
> provide
> > DB creds to the customer teams, who are using Airflow as a multi-tenant
> > orchestration platform. I've updated the doc to reflect this point as
> well.
> >
> > Answering Jarek's points,
> >
> > #1. Yeah, had forgot to write about token mechanism, added that in doc,
> but
> > still how the token can be obtained (safely) is still open in my mind. I
> > believe the token used by task executors can be created outside of it as
> > well (I may be wrong here).
> >
> > #2. Yeah, we would need something similar for triggerers as well, but
> that
> > can be done as part of a different AIP
> >
> > #3. Yeah, I also believe the API should work largely.
> >
> > #4. Added that in the AIP, that instead of dag_dirs we can work with
> > dag_bundles and every dag-processor instance would be treated as a diff
> > bundle.
> >
> > Also, added points around callbacks, as these are also fetched directly
> > from the DB.
> >
> > On Fri, Jul 25, 2025 at 11:58 AM Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> > > > A clarification to this - the dag parser today is likely not
> protection
> > > against a dedicated malicious DAG author, but it does protect against
> > > casual DB access attempts - the db session is blanked out in the
> parsing
> > > process , as are the env var configs
> > >
> > >
> >
> https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/execution_time/supervisor.py#L274-L316
> > > -
> > > is this perfect no? but it’s much more than no protection
> > > Oh absolutely.. This is exactly what we discussed back then in March I
> > > think - and the way we decided to go for 3.0 with full knowledge it's
> not
> > > protecting against all threats.
> > >
> > > On Fri, Jul 25, 2025 at 8:22 AM Ash Berlin-Taylor <a...@apache.org>
> > wrote:
> > >
> > > > A clarification to this - the dag parser today is likely not
> protection
> > > > against a dedicated malicious DAG author, but it does protect against
> > > > casual DB access attempts - the db session is blanked out in the
> > parsing
> > > > process , as are the env var configs
> > > >
> > >
> >
> https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/execution_time/supervisor.py#L274-L316
> > > > - is this perfect no? but it’s much more than no protection
> > > >
> > > > > On 24 Jul 2025, at 21:56, Jarek Potiuk <ja...@potiuk.com> wrote:
> > > > >
> > > > > Currently in the DagFile processor there is no  built-in protection
> > > > against
> > > > > user code from Dag Parsing to - for example - read database
> > > > > credentials from airflow configuration and use them to talk to DB
> > > > directly.
> > > >
> > >
> >
>

Re: [DISCUSS] AIP-92 Isolate DAG parsing logic

Reply via email to