Hello Everyone,

I would like to resume discussion on AIP-67. After going through a
number of discussions and clarifications about the scope of Airflow 3,
I rewrote the proposal for AIP-67 with the assumption that we will do
it for Airflow 3 only - and that it will be based on the new proposed
AIP-72 (Task Execution Interface) rather than Airflow 2-only AIP-44
Internal API.

The updated proposal is here
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components

Feel free to comment there in-line or raise your "big" comments" here,
but here is the impact of changing the target to Airflow 3:

1) I proposed to change configuration of Airflow to use more
structured TOML than plain "ini" - toml is a successor of "ini" and is
largely compatible, but it has arrays, tables and nesting, has good
support in Python and is "de-facto" standard for configuration now
(pyproject.toml and the like). This was far too big of a change for
Airflow 2 but with Airflow 3 it seems very appropriate.

2) On a popular request I added "team_id" as a database field - this
has quite a few far-reaching implications and it's ripple-effect on
Airflow 2 would be far too big for the "limited" multi-team setup -
but since we are going to do full versioning including DB changes in
Airflow 3, this is an opportunity to do it well. The implementation
detail of it will however depend on our choice of supported databases
so there is a little dependency on other decisions here. If we stick
with both Postgres and MySQL we will likely have to restructure the DB
to have synthetic UUID identifiers in order to add both versioning and
multi-team (because of MySQL index limitations).

3) The "proper" team identifier also allows to expand the scope of
multi-team to also allow "per-team" connections and variables. Again
for Airflow 2 case we could limit it to only the case where
connections and variables comes only from "per-team" secrets - but
since we are going to have DB identifiers and we are going to - anyhow
- reimplement Connections and Variables UI to get rid of FAB models
and implement them in reactive technology, it's only a bit more
complex to add "per-team" access there.

4) AIP-72 due to its "task" isolation, allows dropping the idea about
the "--team" flag from the components. With AIP-72 routing tasks to
particular "team" executors" is enough and there is no need to pass
the team information via "--team" flag that was originally supposed to
limit access of the components to only a single team. For Airflow 2
and AIP-44 that was a nice "hack" so that we do not have to carry the
"authorization" information together with the task. But since part of
AIP-72 is to carry the verifiable meta-data that will allow us to
cryptographically verify task provenance, we can drop this hack and
rely on AIP-72 implementation.

5) since DB isolation is "given" by AIP-72, we do not have to split
the delivery of AIP-67 into two phases (with and without DB isolation)
- it will be delivered as a single "with DB isolation" stage.

Those are the major differences vs. the proposal from May ( and as you
might see it is quite a different scope - and this is really why I
insisted on having Airflow 2/ Airflow 3 discussion before we conclude
the vote on it.

I will go through the proposal on Thursday during our call as planned
- but feel free to start discussions and comments before.

J.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to