Re: [DISCUSS] AIP-67 Multi-team deployment of Airflow components (reloaded)

Jarek Potiuk Sun, 28 Jul 2024 23:24:15 -0700

Thanks Vikram.

>
> I think I understand your concept of team very clearly and the concept of
> "team configuration".
> I did find two "per-team configuration principles" somewhat at conflict
> though:
> 1. *Each team configuration SHOULD be a separate configuration file or
> separate set of environment variables, holding team-specific configuration
> needed by DAG file processor, Workers and Triggerer*
> 2. In multi-team deployment, Connections and Variables have (nullable)
> team_id field - which makes them either belonging to a specific team or
> "globally available".
>
> Yes. That one will need a bit of clarification, It's been changed from the
original design where connections and variables were totally separated -
and point 1) is a bit of left-over. I will rephrase it slightly to add
(with exception of common configuration defined at "global level" - that
was a result of other comments where common connections and variables
shared between teams were seen as an important feature, also original
design assumed no connection and variables in database, which has been
changed after we moved to Airlfow 3- only, because now the database
structure modification is all but "given" by other changes.



> Personally, I would have thought that you would want to standardize on
> either explicit configuration required or implicit allowed throughout
> rather than mixing. But, you probably have reasons for why a mixed approach
> is suitable here.
>

Yes. Those comments made me think this is a pretty valid case for the
"mult-team" case. If we did multi-tenant, clear separation would be indeed
better. But with multi-team, I can easily imagine some shared connections
and variables that are "organisation" standard and shared between teams
(Say SMTP connection that is used by all teams to send messages and
notifications).

>
>
> I assume that the DAG Parsing per-team environment is where you see the
> need to integrate with AIP-66 (DAG Bundles ...)
> Is that correct?
>
>
Correct. Also as discussed with AIP-66 initial proposal from Jed - "per
bundle" environment based exclusively on `pip install --target` zipped
archive has many technical traps and does not define the "management" part
of it and "security scope". So here the difference is that both -
separation of dag parsing per team (separate DAG file processor) and common
environment (dependencies) for DAG parsing for team can be still defined
and maintained by the "Team Deployment Manager". And the environment
separation can be done in various ways - same as today (separate image,
separate virtualenv) and it can be done in a completely separated security
perimeter from other teams. Basically - it makes it easy to make sure that
parsing and execution of DAGs belonging to the same team can have the same
dependencies and environment (potentially different from dependencies and
environment of the other teams).

In a way it is very similar to what has been already proposed (but is
abandoned now) in AIP-46 (Runtime isolation for airlfow tasks and dag
parsing) -
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-46+Runtime+isolation+for+airflow+tasks+and+dag+parsing
that has been proposed by Ping Zhang, but unlike AIP-46 it does not make
any assumptions on technology used (virtualenv, images) how the environment
is prepared. The only assumption is that each team has it's own, separate
environment managed by the Team Deployment Manager.

J.

Re: [DISCUSS] AIP-67 Multi-team deployment of Airflow components (reloaded)

Reply via email to