> Understood. I like the idea of extensibility and "Airflow as a platform." 
> However, we should make sure that we do not worsen the user experience with 
> the extensibility. The "User Management Provider" is something that could 
> potentially make the user experience worse, especially for customers who are 
> self-hosting Airflow. Managed services will ensure that they dedicate 
> resources to maintaining their user management providers. Multi-tenancy will 
> end up becoming a feature for managed service customers, leaving the 74% of 
> Airflow users [1] with a less powerful Airflow. As an example, Timetables is 
> a very powerful feature, which, anecdotally, no customer ends up using due to 
> its complexity.

I do not think this will happen. I think part of the effort should not
only implement the API but also to provide a fully fledged (though
simple) implementation of such a provider which works with an
open-source implementation of identity - KeyCloak is one that comes to
my mind. It's possibly jumping ahead a bit to say "let's use KeyCloak
as reference provider we can release", but I think KeyCloak has all we
need:
* integration with mutliple authentication providers and protocols
* User Management:
https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/users/viewing.html
* Role Mangement including user mapping:
https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/roles/user-role-mappings.html
* Group management:
https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/groups/groups-vs-roles.html

It comes with a management console, CLI and much more
(auditing/session management etc. etc.)

In a way it would be simply providing very much the same what FAB
Security Manager does, but with much more complete scope and - most
importantly - it would not be "part of Airflow as FAB is", it would be
"outside" of it and the only thing Airflow would provide is merely
pointers to the Docs of Keycloak on how to integrate it with Airflow
as a proxy: 
https://wjw465150.gitbooks.io/keycloak-documentation/content/server_installation/topics/proxy.html
(or it could be done by writing Airflow KeyCloak Adapter - to be
decided what would be easier to maintain).The users will be free to
configure KeyCloak proxy as they see fit. No DB needed in Airflow to
manage any of those, no UI, no API, no CLI - all that delegated out
and integrated via incoming headers or adapter.

The users will have several choices:

1) For existing users/those who want to keep all "in-airflow-ui"  they
could use FAB Provider (which will be separated from the Core). Same
as today, but without the advanced management features for groups and
tenants. We might consider dropping that altogether eventually.
2) If they are on premise - they can use KeyCloak Provider - by
following our advice/suggestions/simple guidelines on how to
integrate. They would have to manage their own KeyCloak instance (it
won't be a "standard" part of Airflow).
3) If the user runs on AWS/Azure/GCP/others - each cloud  would
(hopefully) develop their own provider to integrate with IAM etc - >
they could use that provider directly. Or they could use and manage
their KeyCloak in the cloud as they see fit (it supports all the
clouds Oauth integration). Or develop their own provider.
4) Those on managed services will have no choice but to use the
provider installed by the Service of theirs

I think that all gives the user the choice - if they want to go role
management and multi-tenant capabilities, fine but they will have to
mange the users outside of Airflow and integrate Airflow with it (and
they can either integrate with what they have already or use
KeyCloak). And does not really impair them.

J,


On Thu, Feb 16, 2023 at 6:27 AM Mehta, Shubham
<shu...@amazon.com.invalid> wrote:
>
> Thanks, Kaxil – that helped to clarify the proposal a bit more.
>
> > Replacing Access Control provided by FAB with a base/core security model 
> > (that is still resource-based)
>
> Are you suggesting that we build this resource-driven security model directly 
> into Airflow, without relying on external dependencies like FAB?
>
> > Extend this to the other Airflow components (scheduler, workers, triggered, 
> > cli)
>
> Are there cases where the scheduler or CLI would require the authorization 
> API? Since they are considered trusted components, I assumed they would not 
> need it.
>
>
> Jarek - as always, I appreciate you sharing your thoughts and having an open 
> discussion.
>
> > Which really explains what "Airflow as a Platform" is all about. I do not 
> > think we already know all the parts that should be converted into "Airflow 
> > extendability". It's more of an incremental effort like that where we have 
> > those bright ideas "Hey - this part can be removed and delegated to 
> > others".  I think this has never been formulated explicitly but I think for 
> > quite a while we are really in the mode where we think much more about what 
> > we can SPLIT OUT from Airflow rather than what we can ADD to Airflow.
>
> Understood. I like the idea of extensibility and "Airflow as a platform." 
> However, we should make sure that we do not worsen the user experience with 
> the extensibility. The "User Management Provider" is something that could 
> potentially make the user experience worse, especially for customers who are 
> self-hosting Airflow. Managed services will ensure that they dedicate 
> resources to maintaining their user management providers. Multi-tenancy will 
> end up becoming a feature for managed service customers, leaving the 74% of 
> Airflow users [1] with a less powerful Airflow. As an example, Timetables is 
> a very powerful feature, which, anecdotally, no customer ends up using due to 
> its complexity.
>
> I am still unclear about other user scenarios related to user management, 
> besides multi-tenancy, that Airflow customers are looking to enable. While 
> the extensibility we aim for will enable this, is there a need for it? Also, 
> @Google-folks, @Astronomer-folks, @Azure-folks, et al. - are you interested 
> in building a custom user management provider that works with your platform? 
> Have there been cases where your customers were limited by the current 
> permissioning model, and you considered replacing FAB?
>
> I believe that the primary motivation for "user management provider" is 
> driven by the excitement around getting rid of FAB, which I think we can 
> still achieve while including multi-tenancy in the core Airflow. Both should 
> be treated as separate problems.
>
> References:
> 1. 
> https://airflow.apache.org/blog/airflow-survey-2022/#how-do-you-deploy-airflow-multiple-choice
>
> On 2023-02-14, 12:44 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>
>     CAUTION: This email originated from outside of the organization. Do not 
> click links or open attachments unless you can confirm the sender and know 
> the content is safe.
>
>
>
>     Comment to Subham's question:
>
>     > In addition, are there any other user scenarios, beyond multi-tenancy, 
> that Airflow users are looking to enable and that require this pluggability? 
> Asking as I haven't come across them. Overall, I believe we need more 
> information on your proposal before seeking feedback from the community. 
> Could we work together during February to develop a concrete proposal?
>
>     I am glad you asked. I think, this is one of the  what I wanted to
>     achieve by adding this page
>     
> https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst
>     - it will be live in 2.6 and one of the main parts is this one:
>
>     
> https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst#using-public-interface-to-extend-airflow-capabilities
>
>     Which really explains what "Airflow as a Platform" is all about. I do
>     not think we already know all the parts that should be converted into
>     "Airflow extendability". It's more of an incremental effort like that
>     where we have those bright ideas "Hey - this part can be removed and
>     delegated to others".  I think this has never been formulated
>     explicitly but I think for quite a while we are really in the mode
>     where we think much more about what we can SPLIT OUT from Airflow
>     rather than what we can ADD to Airflow.
>
>     When you look at it, this is also the main idea behind Open Lineage
>     integration for example - we are adding open linage (which is really
>     just an API) so that others can build "everything-lineage" on top of
>     it. So we are adding a minimum-possible set of APIs and integration so
>     that we can expose the lineage capability so that all the lineage "UI"
>     and other use cases that lineage exposes would be done outside. We are
>     in a strong position to do it - being sure that when we expose it,
>     others will implement the integration they care about.
>
>     I think more and more (and It has been preached by Ash mostly, but
>     also others) that we should be focusing solely on being an extremely
>     powerful and robust scheduler and make sure we are exposing all of the
>     possible things that can be exposed as an external API (while still
>     providing basic implementation that makes airflow still a "finished"
>     product that can be used to handle basic cases.
>
>     BTW. We are now preparing for the Airflow Summit CFP (some
>     announcements will follow shortly, I do not want to spill too many
>     beans) and we have a very interesting broad category "Airflow and
>     ...." . And I think we should work in the direction that the `...` is
>     far bigger than Airflow itself.
>
>     J.
>
>     On Tue, Feb 14, 2023 at 12:34 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
>     >
>     > Great idea Vikram, I love the idea of making this a provider/pluggable.
>     >
>     > In some ways, we already have a pluggable mechanism for Authentication 
> with Auth Backends [1]. Where we will need lot more work I think is:
>     >
>     > Replacing Access Control provided by FAB with a base/core security 
> model (that is still resource-based) [2]
>     > Extend this to the other Airflow components (scheduler, workers, 
> triggered, cli) or make them all driven by a single API that takes care of 
> Auth. This will also reduce a lot of duplication of code across many of the 
> components
>     > For backwards compact, we could ship with FAB-provider that still uses 
> Flask-app builder in addition to our recommended provider that will have more 
> features and users/companies/stabkeholders can build on top of that provider 
> to extend it further.
>     >
>     >
>     > References:
>     > [1]: 
> https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#auth-backends
>     > [2]: 
> https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/security/access-control.html
>     >
>     > On Tue, 14 Feb 2023 at 02:06, Mehta, Shubham 
> <shu...@amazon.com.invalid> wrote:
>     >>
>     >> Hi Vikram,
>     >> Thank you for taking the time to review the proposal. I appreciate 
> your insights — I will make sure to reach out to you directly in the future 
> for feedback as that would've undoubtedly saved us some time and effort.
>     >>
>     >> In regards to the separation of user management, I understand your 
> concerns and, on a high-level, I agree with you. However, I think it would be 
> beneficial to have more details on how it will work. Here are a few questions 
> that come to mind:
>     >> 1. How will the user-id/group-id interface interact with Airflow 
> resource-level permissions? What parts of "John can-edit dag1 and can-view 
> dag2" be part of Airflow core? What will be exposed to the external system?
>     >> 2. Who will be responsible for managing the resource-level 
> permissions? Will it be the external system?
>     >> 3. What are the limitations of this new pluggable model compared to 
> FAB? Will there be restrictions on the granularity of resource access that 
> Airflow admins can provide to their users?
>     >> 4. As Jarek pointed out, with this change we want to make 
> authorization externally driven. Will this have a significant impact on 
> Airflow performance as authorization will be required for fetching variables, 
> executing tasks, etc.?
>     >> 5. What will the migration process look like for existing users to 
> this non-FAB pluggable model?
>     >>
>     >> In addition, are there any other user scenarios, beyond multi-tenancy, 
> that Airflow users are looking to enable and that require this pluggability? 
> Asking as I haven't come across them. Overall, I believe we need more 
> information on your proposal before seeking feedback from the community. 
> Could we work together during February to develop a concrete proposal?
>     >>
>     >> Beside this, I would like to propose that we define the scope and 
> long-term vision of "Airflow core". To achieve this, it may be helpful to 
> first outline the perspectives of the Airflow PMCs. Recently, there have been 
> discussions regarding the separation of executors into a separate package, 
> the implementation of pluggable schedulers, and other related topics. 
> Currently, these decisions and discussions are somewhat ad hoc and are made 
> through the mailing list. I would be happy to collaborate and invest time in 
> this effort.
>     >>
>     >> Regards
>     >> Shubham
>     >>
>     >> On 2023-02-13, 11:04 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>     >>
>     >>     CAUTION: This email originated from outside of the organization. 
> Do not click links or open attachments unless you can confirm the sender and 
> know the content is safe.
>     >>
>     >>
>     >>
>     >>     Hey Vikram,
>     >>
>     >>     I think it's brilliant and I wonder how it happened that had not
>     >>     occurred to us earlier. And I believe that is due to the natural
>     >>     tendency of "following as we always did" rather than thinking
>     >>     completely out-of-the-box. Thanks Vikram for bringing it up.
>     >>
>     >>     The funny thing is that when I see this:
>     >>
>     >>     > However, I don't agree that this level of user management 
> belongs in "Core Airflow".
>     >>
>     >>     I almost immediately think - NOOOOO, why, it's always been here, 
> how
>     >>     can we remove it?
>     >>
>     >>     But then if you look a bit closer:
>     >>
>     >>     > think this is a time to consider the concept of a "user 
> management provider" with a simple built-in implementation being the current 
> Airflow functionality, enabling alternate more complex (but separate) 
> implementations such as your proposal here as alternate user management 
> providers.
>     >>
>     >>     Then it starts to make way more sense. Way more.
>     >>
>     >>     And when you look further:
>     >>
>     >>     >  Maybe, this also enables us to get rid of the Fab security 
> manager from core Airflow?
>     >>
>     >>     My heart jumps and I am immediately sold on the idea.
>     >>
>     >>     When I was commenting on the doc  initially, something was not 
> right.
>     >>     I had a feeling It is probably the 5th time I am looking and
>     >>     commenting on a similar document. And, well, I did, actually. Most 
> of
>     >>     the things we discussed there are already implemented out there. We
>     >>     just need to make sure we expose enough of the API to use them. For
>     >>     example we have Keycloak that is an open source implementation of
>     >>     Identity and Access Management. With everything out there already
>     >>     integrated. and I've been part of the project that integrated just 
> the
>     >>     authentication part. Now if we rethink the authorization and make 
> it
>     >>     simpler and "externally driven", this will not only be faster IMHO,
>     >>     but also will allow enterprise users to integrate much better.
>     >>
>     >>     I believe following the path that Vikram outlined will be a good
>     >>     direction for everyone in the community - including all the Manage
>     >>     Service providers, who will have a far easier job on integrating
>     >>     Airflow into their authentication models.
>     >>
>     >>     J.
>     >>
>     >>
>     >>
>     >>     On Mon, Feb 13, 2023 at 6:24 PM Vikram Koka
>     >>     <vik...@astronomer.io.invalid> wrote:
>     >>     >
>     >>     > Shubham and Vincent,
>     >>     >
>     >>     > Let me start by saying that I apologize for my delayed response 
> to your original email.
>     >>     >
>     >>     > I appreciate the detailed write-up and the thought behind it. I 
> completely agree with your use case and understand how this is applicable to 
> enterprises with multiple data teams using Airflow.
>     >>     >
>     >>     > However, I don't agree that this level of user management 
> belongs in "Core Airflow".
>     >>     >
>     >>     > I strongly believe that the core Airflow mission is for the 
> community at large and for data practitioners either individuals or teams 
> within enterprises. And therefore, I don't disagree with the intent of making 
> it easier for enterprise teams to adopt Airflow. But, I think there is a 
> never ending list of user management features which are needed to support 
> Enterprise needs. We have already struggled with this over time and faced 
> challenges with the Fab security manager and its integration in Airflow.
>     >>     >
>     >>     > I think we should use this opportunity and your use case to 
> "separate the user management" from Core Airflow outside of the absolute 
> basics. I think this is a time to consider the concept of a "user management 
> provider" with a simple built-in implementation being the current Airflow 
> functionality, enabling alternate more complex (but separate) implementations 
> such as your proposal here as alternate user management providers. Maybe, 
> this also enables us to get rid of the Fab security manager from core Airflow?
>     >>     >
>     >>     > Best regards,
>     >>     > Vikram
>     >>     >
>     >>     >
>     >>     > On Fri, Feb 3, 2023 at 8:22 AM Beck, Vincent 
> <vincb...@amazon.com.invalid> wrote:
>     >>     >>
>     >>     >> Thanks __
>     >>     >>
>     >>     >> On 2023-02-03, 10:55 AM, "Jarek Potiuk" <ja...@potiuk.com> 
> wrote:
>     >>     >>
>     >>     >>     CAUTION: This email originated from outside of the 
> organization. Do not click links or open attachments unless you can confirm 
> the sender and know the content is safe.
>     >>     >>
>     >>     >>
>     >>     >>
>     >>     >>     Added.
>     >>     >>
>     >>     >>     On Fri, Feb 3, 2023 at 3:53 PM Beck, Vincent
>     >>     >>     <vincb...@amazon.com.invalid> wrote:
>     >>     >>     >
>     >>     >>     > Thank you! 
> https://cwiki.apache.org/confluence/display/~vin100.beck
>     >>     >>     >
>     >>     >>     > On 2023-02-02, 5:38 PM, "Jarek Potiuk" <ja...@potiuk.com> 
> wrote:
>     >>     >>     >
>     >>     >>     >     CAUTION: This email originated from outside of the 
> organization. Do not click links or open attachments unless you can confirm 
> the sender and know the content is safe.
>     >>     >>     >
>     >>     >>     >
>     >>     >>     >
>     >>     >>     >     What's your cwiki ID, Vincent (I'll add you without 
> going into details yet)
>     >>     >>     >
>     >>     >>
>     >>
>

Reply via email to