Hello here.

I applied a number of comments mostly from TP, Ash, Daniel who were
concerned about the scope of the AIP (including the DB changes) but Also
Shubham, Nikolas and others.

The small (but most visible) change  where "multi-tenant" might promise too
much. I also talked to at least a few users (including some big bank whom I
spoke to the previous week and they explained how they implemented
something similar in their deployments). The last one actually made me
assured that if we make it easier for the users who have a need to support
multiple teams within their organization, and explain to them how they can
do it - and provide workload, but also - eventually - DB isolation, it
might be really useful deployment options for such users.

I updated the document today, hoping that all the feedback is addressed.
The main changes that I implemented:

* The name is changed to "Multi-team deployment of Airflow" and the
"--tenant" flag is changed to "--team" flag across the board.  I also -
after some thoughts - think it's genuinely better name as it avoids a lot
of ambiguities connected with multi-tenancy. I think different people have
different understanding of what "multi-tenancy" is, so it's better to be as
precise as possible about what we are proposing to our users.
* Removed Connections/Variables from the scope altogether. I realized that
we can drop all of it, by simply disabling the access to metadata DB (and
UI access) for those, and only rely on Secrets Managers (separately
configured per each team). That will cut down the scope a lot, it also
nicely solves the problem where each team might have different providers
installed when it comes to "connections" UI - the Connections / Variables
screens will be gone in a multi-team environment (BTW. That's what I
learned from the big bank - they modified their airflow to remove metadata
db Connection/Variables access).
* I proposed to introduce the AIP in two phases - first without the DB
isolation, second - with DB isolation. I think even phase 1 will be useful
for the users and it will be way simpler to implement and test. But there
is still value for the DB isolation, and I would like to keep it as a
single AIP to vote on as only then we can have the right "Security"
perimeter in.
* I proposed to add a very simple Dataset access control mechanism -
following today's implementation, allow anyone to produce any dataset
events, but only limit triggering by default to the same "Team" allows DAG
authors to specify which "teams" can trigger the DAG via dataset.
* I renamed "Internal API" to "GRPC API" - proposed by Daniel in one of the
recent PRs, I think it's a better name and I started to use it everywhere.

I believe that goes quite far in addressing all the concerns raised. I
would love to start a vote on it by the end of the week unless there are
serious concerns.

Feel free to comment here - or in the document.

J.




On Tue, Mar 12, 2024 at 12:05 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> I have iterated and already got a LOT of comments from a LOT of people
> (Thanks everyone who spent time on it ). I'd say the document is out of
> draft already, it very much describes the idea of multi-tenancy that I hope
> we will be voting on some time in the future.
>
> Taking into account that ~ 30% of people in our survey said they want
> "mutl-tenancy" -  what I am REALLY interested in is to get honest feedback
> about the proposal. Manly:
>
> **"Is this the multi-tenancy you were looking for?" *
>
> Or were you looking for different droids (err, tenants) maybe?.
>
> I do not want to exercise my Jedi skills to influence your opinion, that's
> why the document is there (and some people say it's nice, readable and
> pretty complete) so that you can judge yourself and give feedback.
>
> The document is here:
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-tenant+deployment+of+Airflow+components
>
>
> Feel free to comment here, or in the document. I would love to hear more
> voices, and have some ideas what to do next to validate the idea, so please
> - engage for now - but also expect some follow-ups.
>
> J.
>
>
> On Wed, Mar 6, 2024 at 9:16 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Sooo.. Seems that it's an AIP time :D I've just published a Draft of
>> AIP-67:
>>
>> Multi-tenant deployment of Airflow components
>>
>>
>> https://cwiki.apache.org/confluence/display/AIRFLOW/%5BDRAFT%5D+AIP-67+Multi-tenant+deployment+of+Airflow+components
>>
>> This AIP  is a bit lighter in detail than the others you could see
>> from Jed , Nikolas and Maciej. This is really a DRAFT / High Level
>> idea of Multi-Tenancy that could be implemented as the follow-up after
>> previous steps of Multi-Tenancy implemented (or being implemented)
>> right now.
>>
>> I decided to - rather than describe all the details now -  focus on
>> the concept of Multitenancy that I wanted to propose. Most of all
>> explaining the concept, comparing it to current ways of achieving some
>> forms of multi-tenancy and showing benefits and drawbacks of the
>> solution and connected costs (i.e. what complexity we need to add to
>> achieve it).
>>
>> When thinking about Multi-tenancy, I realized few things:
>>
>> * everyone might understand multi-tenancy differently
>> * some forms of multi-tenancy are achievable even today
>> * but - most of all - I started to question myself "Is this what we
>> can do, enough for some, sufficiently numerous groups of users to call
>> it a useful feature for them".
>>
>> So before we get into more details - my aim is to make sure we are all
>> at the same page on what we CAN do as a multi-tenancy, and eventually
>> to decide whether we SHOULD do it.
>>
>> Have fun. Bring in comments and feedback.
>>
>> More about all the currently active AIPs at today's Town Hall
>>
>> BTW. Do you think it's a surprise that 5 AIPS were announced just
>> before the Town Hall? I think not  :D
>>
>> J.
>>
>

Reply via email to