Nice, thanks for clarifying all this! Now that I read the new proposal, it is adding up to me why certain decisions were made. The decision to separate the "common" part from the "per team" part adds up now. It is a traditional paradigm of separating "control plane" from "compute".
Thanks & Regards, Amogh Desai On Mon, Jul 15, 2024 at 8:53 PM Jarek Potiuk <ja...@potiuk.com> wrote: > I got the transcript and chat from the last call (thanks Kaxil!) and it > allowed me to answer a few questions that were asked during my presentation > about AIP-67. I updated the AIP document but here is summary: > > 1) What about Pools (asked by Elad and Jed, Jorrick): I thought about it > and I propose that pools could have (optional) team_id added. This will > allow users to keep common pools (no team_id assigned) and have > team-specific ones. DAG file processor specific for each team will fail DAG > if it tries using a pool that is not common and belonging to "other team". > Also each team will be able to have their own "default_pool" configured. > This will give enough flexibility on 'common vs. team exclusive" use of > pools. > > 2) Isolation for connections (John, Filip, Elad, Kaxil, Amogh, Ash): yes. > That is part of the design. The connections and variables can be accessed > per team - AIP-72 will only provide the tasks with connections that belong > to the team. Ash mentioned OPA (which might be used for that purpose). It's > not defined how exactly it will be implemented in AIP-72, it's not detailed > enough, but it can use the very mechanisms that AIP-72 - by only allowing > "global" connections and "my team" connections to be passed by AIP-72 API > to the task and DAG file processor. > > 3) Whether "Team=deployment" - Igor / Vikram ? -> depends on what you > understand by deployment. i'd say "sub-deployment" - each deployment in a > "multi-team" environment will consist of the "common" part and each team > will have their own part (where configuration and management of such team > deployment parts will be delegated to the team deployment manager). For > example such deployment managers will be able to build and publish the > environment (for example container images) used by team A to run Airflow. > Or change "team" specific configuration. > > 4) "This seems like quite a lot of work to share a scheduler and a web > server. What’s the net benefit of this complexity?" -> Ash, John, Amogh, > Maciej: Yes. I absolutely see it as a valuable option. It reflects > organizational structure and needs of many of our users, where they want to > manage part of the environment, monitoring of what's going in all of their > teams centrally (and manage things like upgrades of Airflow, security > centrally), while they want to delegate control of environments and > resources down to their teams. This is the need that I've heard from many > users who have a "data platform team" that makes Airflow available to their > several teams. I think the proposal I have is a nice middle ground that > follows Conway's law - that architecture of your system should reflect your > organizational structure - and what I separated out as "common" parts is > precisely what "data platform team" would like to manage, where "team > environment" is something that data platform should (and want to) delegate > to their teams. > > 5) "I am a little surprised by a shared dataset" - Vikram/Elad : The > datasets are defined by their URLs and as such - they don't have > "ownership". As I see it - It's really important who can trigger a DAG and > the controls I proposed allow the DAG author to specify "In this DAG it's > also ok when a different team (specified) triggered the dataset event". But > I left a note that it is AIP-73-dependent "Expanded Data Awareness" and > once we get that explained/clarified I am happy to coordinate with > Constance and see if we need to do more. Happy to hear more comments on > that one. > > I reflected the 2 points and 5) in the AIP. Looking forward to more > comments on the proposal - in the AIP or here. > > J. > > > > > On Tue, Jul 9, 2024 at 4:48 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > > Hello Everyone, > > > > I would like to resume discussion on AIP-67. After going through a > > number of discussions and clarifications about the scope of Airflow 3, > > I rewrote the proposal for AIP-67 with the assumption that we will do > > it for Airflow 3 only - and that it will be based on the new proposed > > AIP-72 (Task Execution Interface) rather than Airflow 2-only AIP-44 > > Internal API. > > > > The updated proposal is here > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components > > > > Feel free to comment there in-line or raise your "big" comments" here, > > but here is the impact of changing the target to Airflow 3: > > > > 1) I proposed to change configuration of Airflow to use more > > structured TOML than plain "ini" - toml is a successor of "ini" and is > > largely compatible, but it has arrays, tables and nesting, has good > > support in Python and is "de-facto" standard for configuration now > > (pyproject.toml and the like). This was far too big of a change for > > Airflow 2 but with Airflow 3 it seems very appropriate. > > > > 2) On a popular request I added "team_id" as a database field - this > > has quite a few far-reaching implications and it's ripple-effect on > > Airflow 2 would be far too big for the "limited" multi-team setup - > > but since we are going to do full versioning including DB changes in > > Airflow 3, this is an opportunity to do it well. The implementation > > detail of it will however depend on our choice of supported databases > > so there is a little dependency on other decisions here. If we stick > > with both Postgres and MySQL we will likely have to restructure the DB > > to have synthetic UUID identifiers in order to add both versioning and > > multi-team (because of MySQL index limitations). > > > > 3) The "proper" team identifier also allows to expand the scope of > > multi-team to also allow "per-team" connections and variables. Again > > for Airflow 2 case we could limit it to only the case where > > connections and variables comes only from "per-team" secrets - but > > since we are going to have DB identifiers and we are going to - anyhow > > - reimplement Connections and Variables UI to get rid of FAB models > > and implement them in reactive technology, it's only a bit more > > complex to add "per-team" access there. > > > > 4) AIP-72 due to its "task" isolation, allows dropping the idea about > > the "--team" flag from the components. With AIP-72 routing tasks to > > particular "team" executors" is enough and there is no need to pass > > the team information via "--team" flag that was originally supposed to > > limit access of the components to only a single team. For Airflow 2 > > and AIP-44 that was a nice "hack" so that we do not have to carry the > > "authorization" information together with the task. But since part of > > AIP-72 is to carry the verifiable meta-data that will allow us to > > cryptographically verify task provenance, we can drop this hack and > > rely on AIP-72 implementation. > > > > 5) since DB isolation is "given" by AIP-72, we do not have to split > > the delivery of AIP-67 into two phases (with and without DB isolation) > > - it will be delivered as a single "with DB isolation" stage. > > > > Those are the major differences vs. the proposal from May ( and as you > > might see it is quite a different scope - and this is really why I > > insisted on having Airflow 2/ Airflow 3 discussion before we conclude > > the vote on it. > > > > I will go through the proposal on Thursday during our call as planned > > - but feel free to start discussions and comments before. > > > > J. > > >