[jira] [Comment Edited] (HDDS-4944) Multi-Tenant Support in Ozone

Prashant Pogde (Jira) Sat, 17 Apr 2021 10:50:05 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324179#comment-17324179
 ]


Prashant Pogde edited comment on HDDS-4944 at 4/17/21, 5:49 PM:
----------------------------------------------------------------

Thank you István Fajth for detailed summary and good to see having consensus on 
various points. I will just go over the concerns raised here.

> But I share Marton's concern about lack of discussion of other possibilities, 
> and the lack of clarity on why we choose to implement it

> as a layer on top of what we have, I also think that it would be nice to 
> somehow shed more light on the pros and cons regarding

> these fundamental decisions.........

 

I believe, we did talk about the alternative that Marton brought up regarding 
implementing multi-tenancy by running different instances of S3G for each 
tenant. But this was not pursued further for the following reasons
 * It makes the multi-tenancy feature dependent on deployment.
 * Every time customers need to add another tenant, they will need to launch 
another instance of S3g. The users of the system will have a dependency on 
Operations team.
 * The onus of tracking which S3G instance serves which specific tenant, lies 
with the  customer and application.
 * This also adds to complexity for debugging and diagnosability when things go 
wrong. 
 * It would further add burden to provide HA for S3G service because, now every 
instance needs to have HA.
 * This would limit Multi-tenancy feature to just S3 users and it can not be 
extended to kerberos users or OIDC in future easily.

 

What we are proposing here, is Multi-tenancy as part of core Ozone-Manager 
module. Please check the recently uploaded picture 

uml_multitenant_interface_design.png
 We will have an extension of IAccessAuthorizer that would be multi-tenancy 
aware and  would enforce all multi-tenancy-access controls. We will have a 
Mult-tenant-manager-module that would connect APIs with configuring 
multi-tenant-access and enforcing Isolation. The added advantage is, this 
feature can be extended to kerberos-users as well and any new 
authentication(OIDC) support that we add in future because its part of core 
ozone-manager and is completely authentication-system independent.

>> As Marton already suggested, at the end of the day if we want to implement 
>> account namespace isolation, we still would need a

>> unique identifier for all entities that are reaching the cluster. At the 
>> moment as we discussed we have the AWS access key id

>> coupled to a kerberos principal, and we map the two together via a mapping 
>> of UGI to an AWS style access key id. This tight

>> coupling seems to be a problem, but as I see, the coupling of 
>> authentication, groupping, authorization, and a few other things is

>> our problem here. Hence I suggested to decouple these, but I am unsure if 
>> this was clear for everyone, and I felt it was not. So I try

>> to rephrase the suggestion.

 

In the existing S3 design, access-key-id is tied to kerberos identity.  It 
doesn't have to be this way only. We are providing an API to Create  S3-User as 
if they do not have any kerberos identity at all. This is along the lines
 * Parameters : Tenant-Name, new-user-Name, 
credentials-for-person-invoking-this-API, ...
 * ReturnValue : S3-shared-secret-password  for new user if success or 
error-code

The new username has nothing to do with any kerberos Identity. Also we can 
construct the UGI struct whichever way we want to build, during 
AccessAuthorization inside Ozone-Manager.

]Let's look at the  possibility that users coming from different mechanism 
\{kerberos/S3/OIDC} could collide.

To deal with this, Ozone as a system could provide some safeguards. But at the 
same time, I believe the onus of setting up Ozone cluster and who can access 
the system with a unique identity or duplicate identity, lies with the 
customer/Administrators of Ozone. They can also follow a uniform naming 
convention so that the users of the system do not collide. The simplest 
Safeguard Ozone can offer is to return EEXIST error when a user already  exists 
in the system (or even allow duplicate identities with --force option). We do 
not have to add complexity in Ozone for all possibilities.

>> During the cross tenancy access discussion part, I had a concern coming up 
>> but could not get to discuss it online

>> You said during the meeting that for maybe everyone, but for at least tenant 
>> admins, all users are visible, did I misunderstand or i

>> sn't this contradictory with the requirements? I mean, if tenant admins can 
>> see just the tenant administered by themselves, then

>> seeing all users possibly with a tenant id is something that goes beyond 
>> this limit, as based on the list of users all tenants become visible.

>> This can be a problem in case where tenants are not different 
>> sub-organizations within the same company, but strictly segregated >> 
>> companies renting cluster space at some provider, or in case where there is 
>> a tenant which is just hosted for a project, sharing l>> imited information, 
>> but possibly not the list of all users.

 

We are discussing the default behavior here. In a typical organization, by 
default, you can list all the users, even from other departments. Having said 
that, the way we are designing this, all resources can be access-controlled 
including the UserList API. Default behavior is to make the user listing 
visible to all. But if a tenant-Admin wants, they can hide their users with 
appropriate User Policy.

 

>>  I still feel that we need an authentication agnostic internal user 
>>representation which is provided by the authentication plugin

>>used, and contains at least external user id, internal user id, and group's 
>>associated with the user by the systems we use for >> authentication. If 
>>this, or something even better is what we will use and we use it with the 
>>volume abstraction I may be more than

>> happy.

 

Let us not focus on the authentication here as it can take us in a different 
direction.  We are not doing anything new for the authentication schemes in the 
Ozone. The existing authentication schemes remain as they are. We are only 
doing multi-tenancy-aware authorization in the core Ozone-Manager. Plus, We are 
providing a way for S3 users to connect to this Multi-tenancy feature in Core 
Ozone.

I believe, we should just take the username/access-id as provided by the 
authentication system and we need not do another mapping from external to 
internal id. There could be a valid use case that a user "bob" can authenticate 
him through either kerberos or S3-shared secret. We should leave it to 
system-admin as to how they want their user-ids to be unique (or duplicate) 
while creating S3 identity or kerberos identity. Unless there is a a use-case 
that we can not solve with what we have, we need not complicate this part,

 

>> coupling the authentication scheme to kerberos via Hadoop UGI even in the S3 
>> gateway by mapping the access key id to a UGI as

>> we discussed. If this proposal does not limit or extend this, then we will 
>> have a problem later on as I feel, and I don't see how it will

>> be solved. What I understood so far is that we are using UGI, and the Hadoop 
>> groups mapping internally, I think, if we do not

>> decouple ourselves from these, and an auth plugin that provide us a unified 
>> view of user, and its groups/roles/tenancy then we will

>> end up patching and workaround UGI internals to conform with our needs. 
>> That's is why I push to separate these from our internal

>> representation and push it to a pluggable authenticator, that provides us 
>> the external user, based on that our internal user, and

>> also its groups/roles/tenancy information. I think, this is an important 
>> prerequisite before we can talk about tenancy and that

>> abstraction. Also it is really unclear, how an external auth system like 
>> LDAP will be configured to provide us the tenant

>> identification for example? Will we have an internal map from user to 
>> tenant? How exactly you are envisioning this? Describing

>> alternatives and reasons for selecting the actually implemented one, would 
>> be good to understand here as well, based on that I

>> hope to be as optimistic as you are and I hope to think about this as just a 
>> high level grouping of users.

 

Looking at the code, I believe whatever we proposed in the design is doable. 
UGI can be filled whichever way we want to,

in Ozone Manager and then passed to authorizer-plugin for checkAccess(). 

 

>> This brings us back to the grouping problem, where from these subsystems 
>> will Ozone figure out what groups are there? For this

>> to integrate smoothly we will need a subsystem/plugin that handles the 
>> account namespace, and maps to internally understood

>> entities as I can think of it know. I would couple this with authentication 
>> as proposed, because that is the point we can and

>> probably do map the user to tenant/roles/groups. Again I am maybe ignorant, 
>> and something similar is in the making, but I would

>> like to understand what is in the making, and understand what are the 
>> benefits if something else is the plan.

 

Just like the current authorizer that has native-support + ranger-support, this 
design will have multi-tenant-authorizer that will have both Ranger-support + 
native-Ozone support. Please note that the "group management" that  is referred 
here, is only being used to support account-namespace. This has nothing to do 
with exposing group-management as a feature to Ozone users. We do not have any 
plans to support "Groups" as a feature here.  How to isolate users into 
account-namespace is very much implementation dependent aspect. Different 
multi-tenant-authorizer plugins can implement it different ways. From high 
level design perspective, we are not suggesting how any specific implementation 
should implement the account-namespace enforcement. When we say that 
Ranger-plugin can use the group/role abstraction it already has, we are just 
giving an example of how easily a ranger based plugin can provide 
account-namespace Isolation. For a native Ozone plugin, it could be something 
as simple as keeping  a persistent mapping of user-names and account-namespace 
they belong to. 

>> One is the thing we mentioned in the call, namely how a user will access a 
>> bucket in its own tenant, which access points will be used?

 

All S3 users, regardless of their tenancy, would go to the same S3 access point.

 

>> How we will figure out which resource to serve?

 

The API specifies which specific resource is accessed.

>> The simple case when a user belongs to one tenant, and accesses a resource 
>> in its tenant I can imagine also in a multitude of

>> ways... but I can not come up with and have not seen an easy user friendly 
>> way to access tenant b's foo bucket (even if access is

>> granted for me) if I am in a tenant where there is a foo bucket as well. Are 
>> there any ideas on this already?

 

Buckets can be refered by "*tenant-name:bucketname"* syntax and can be accessed 
by anyone who has authorization to access it. There are other ways to do it 
too. But for now, we plan to support "tenant-name:bucketname" convention. 

 

>> An other question is whether and how much we want to segregate tenants data, 
>> I guess volumes and buckets are already

>> providing some segregation on the container/block level, but how we will 
>> enforce that access remains within boundaries when the

>> user reads data from the DNs? Should we have concerns about accessing other 
>> tenants data via DN api's? How we can enforce

>> access on that level? Marton Elek Do you have any idea if we potentially 
>> have to think about exploits here, or in the SCM layer, as

>> we mostly talked about things that fell into OM territory.

 

We already should have enough security safeguards in DataNode to make sure one 
user can not access another user's data. I do not see why it should be any 
different with multi-tenancy support. I guess if there is any exploit that 
exists now, it would be a problem even without multi-tenancy.

 

>> The proposal does not mention, but I guess, if a system is configured with a 
>> certain mix of isolation styles, then reconfiguring it to an

>>other style is not possible, or is it? (I think of for example going from 
>>having just account namespace isolation to account and bucket namespace 
>>isolation, or vice versa for example)

 

Yes, this is correct. Once a style of multi-tenancy is chosen, it can not be 
changed.

 

>> In a multi-tenant word, why do we want to still have the S3V volume as 
>> something everyone has access to? If admins want to configure it that way, 
>> this seems to be ok, but if we provide that by default, then how users will 
>> tell if they want to use that? Should users be enabled to use that by def? 
>> What if we do not have anything there, or what if we want to migrate some 
>> stuff there when multi-tenancy is turned on?

 

It's for backward compatibility.  The default kerberos users with their S3 
identity can continue to access S3V volume. We can also use the syntax 
"tenancy:bucketname" mentioned above for accessing buckets in S3V.  (subject to 
authorization)

 

 

>> How we think about sub-groups within a tenant? What if later on we want to 
>> spin off a tenant from one of these sub-groups? How this possibility -if we 
>> want to provide a way - affects our current thinking?

 

 

Sub groups can always be implemented by defining another group. Spinning of a 
tenant from subgroup is an unknown requirement for now, but I guess if we 
wanted to do it, we could in future.

 

>> Are there any memorable, but loosely coupled ideas or thoughts that were 
>> coming up during design, that had an effect on certain

>> decisions that were made? (to add some color and more reasons on the 
>> selected way of thinking)

 

I believe, we made attempts to use whatever we already have in Ozone or use 
another existing project like Apache-Ranger, wherever possible to address the 
requirements.

 


was (Author: ppogde):
Thank you István Fajth for detailed summary and good to see having consensus on 
various points. I will just go over the concerns raised here.


> But I share Marton's concern about lack of discussion of other possibilities, 
> and the lack of clarity on why we choose to implement it

> as a layer on top of what we have, I also think that it would be nice to 
> somehow shed more light on the pros and cons regarding

> these fundamental decisions.........

 

I believe, we did talk about the alternative that Marton brought up regarding 
implementing multi-tenancy by running different instances of S3G for each 
tenant. But this was not pursued further for the following reasons
 * It makes the multi-tenancy feature dependent on deployment.
 * Every time customers need to add another tenant, they will need to launch 
another instance of S3g. The users of the system will have a dependency on 
Operations team.
 * The onus of tracking which S3G instance serves which specific tenant, lies 
with the  customer and application.
 * This also adds to complexity for debugging and diagnosability when things go 
wrong. 
 * It would further add burden to provide HA for S3G service because, now every 
instance needs to have HA.
 * This would limit Multi-tenancy feature to just S3 users and it can not be 
extended to kerberos users or OIDC in future easily.

 


What we are proposing here, is Multi-tenancy as part of core Ozone-Manager 
module. Please check the recently uploaded picture 

uml_multitenant_interface_design.png
We will have an extension of IAccessAuthorizer that would be multi-tenancy 
aware and  would enforce all multi-tenancy-access controls. We will have a 
Mult-tenant-manager-module that would connect APIs with configuring 
multi-tenant-access and enforcing Isolation. The added advantage is, this 
feature can be extended to kerberos-users as well and any new 
authentication(OIDC) support that we add in future because its part of core 
ozone-manager and is completely authentication-system independent.

>> As Marton already suggested, at the end of the day if we want to implement 
>> account namespace isolation, we still would need a

>> unique identifier for all entities that are reaching the cluster. At the 
>> moment as we discussed we have the AWS access key id

>> coupled to a kerberos principal, and we map the two together via a mapping 
>> of UGI to an AWS style access key id. This tight

>> coupling seems to be a problem, but as I see, the coupling of 
>> authentication, groupping, authorization, and a few other things is

>> our problem here. Hence I suggested to decouple these, but I am unsure if 
>> this was clear for everyone, and I felt it was not. So I try

>> to rephrase the suggestion.

 

In the existing S3 design, access-key-id is tied to kerberos identity.  It 
doesn't have to be this way only. We are providing an API to Create  S3-User as 
if they do not have any kerberos identity at all. This is along the lines
 * Parameters : Tenant-Name, new-user-Name, 
credentials-for-person-invoking-this-API, ...
 * ReturnValue : S3-shared-secret-password  for new user if success or 
error-code

The new username has nothing to do with any kerberos Identity. Also we can 
construct the UGI struct whichever way we want to build, during 
AccessAuthorization inside Ozone-Manager.

]Let's look at the  possibility that users coming from different mechanism 
\{kerberos/S3/OIDC} could collide.

To deal with this, Ozone as a system could provide some safeguards. But at the 
same time, I believe the onus of setting up Ozone cluster and who can access 
the system with a unique identity or duplicate identity, lies with the 
customer/Administrators of Ozone. They can also follow a uniform naming 
convention so that the users of the system do not collide. The simplest 
Safeguard Ozone can offer is to return EEXIST error when a user already  exists 
in the system (or even allow duplicate identities with --force option). We do 
not have to add complexity in Ozone for all possibilities. 

>> During the cross tenancy access discussion part, I had a concern coming up 
>> but could not get to discuss it online

>> You said during the meeting that for maybe everyone, but for at least tenant 
>> admins, all users are visible, did I misunderstand or i

>> sn't this contradictory with the requirements? I mean, if tenant admins can 
>> see just the tenant administered by themselves, then

>> seeing all users possibly with a tenant id is something that goes beyond 
>> this limit, as based on the list of users all tenants become visible.

>> This can be a problem in case where tenants are not different 
>> sub-organizations within the same company, but strictly segregated >> 
>> companies renting cluster space at some provider, or in case where there is 
>> a tenant which is just hosted for a project, sharing l>> imited information, 
>> but possibly not the list of all users.

 

We are discussing the default behavior here. In a typical organization, by 
default, you can list all the users, even from other departments. Having said 
that, the way we are designing this, all resources can be access-controlled 
including the UserList API. Default behavior is to make the user listing 
visible to all. But if a tenant-Admin wants, they can hide their users with 
appropriate User Policy.

 

>>  I still feel that we need an authentication agnostic internal user 
>>representation which is provided by the authentication plugin 

>>used, and contains at least external user id, internal user id, and group's 
>>associated with the user by the systems we use for >> authentication. If 
>>this, or something even better is what we will use and we use it with the 
>>volume abstraction I may be more than 

>> happy.

 

Let us not focus on the authentication here as it can take us in a different 
direction.  We are not doing anything new for the authentication schemes in the 
Ozone. The existing authentication schemes remain as they are. We are only 
doing multi-tenancy-aware authorization in the core Ozone-Manager. Plus, We are 
providing a way for S3 users to connect to this Multi-tenancy feature in Core 
Ozone.

 I believe, we should just take the username/access-id as provided by the 
authentication system and we need not do another mapping from external to 
internal id. There could be a valid use case that a user "bob" can authenticate 
him through either kerberos or S3-shared secret. We should leave it to 
system-admin as to how they want their user-ids to be unique (or duplicate) 
while creating S3 identity or kerberos identity. Unless there is a a use-case 
that we can not solve with what we have, we need not complicate this part,

 

>> coupling the authentication scheme to kerberos via Hadoop UGI even in the S3 
>> gateway by mapping the access key id to a UGI as

>> we discussed. If this proposal does not limit or extend this, then we will 
>> have a problem later on as I feel, and I don't see how it will

>> be solved. What I understood so far is that we are using UGI, and the Hadoop 
>> groups mapping internally, I think, if we do not

>> decouple ourselves from these, and an auth plugin that provide us a unified 
>> view of user, and its groups/roles/tenancy then we will

>> end up patching and workaround UGI internals to conform with our needs. 
>> That's is why I push to separate these from our internal

>> representation and push it to a pluggable authenticator, that provides us 
>> the external user, based on that our internal user, and

>> also its groups/roles/tenancy information. I think, this is an important 
>> prerequisite before we can talk about tenancy and that

>> abstraction. Also it is really unclear, how an external auth system like 
>> LDAP will be configured to provide us the tenant

>> identification for example? Will we have an internal map from user to 
>> tenant? How exactly you are envisioning this? Describing

>> alternatives and reasons for selecting the actually implemented one, would 
>> be good to understand here as well, based on that I

>> hope to be as optimistic as you are and I hope to think about this as just a 
>> high level grouping of users.

 

Looking at the code, I believe whatever we proposed in the design is doable. 
UGI can be filled whichever way we want to,

in Ozone Manager and then passed to authorizer-plugin for checkAccess(). 

 

>> This brings us back to the grouping problem, where from these subsystems 
>> will Ozone figure out what groups are there? For this

>> to integrate smoothly we will need a subsystem/plugin that handles the 
>> account namespace, and maps to internally understood

>> entities as I can think of it know. I would couple this with authentication 
>> as proposed, because that is the point we can and

>> probably do map the user to tenant/roles/groups. Again I am maybe ignorant, 
>> and something similar is in the making, but I would

>> like to understand what is in the making, and understand what are the 
>> benefits if something else is the plan.

 

Just like the current authorizer that has native-support + ranger-support, this 
design will have multi-tenant-authorizer that will have both Ranger-support + 
native-Ozone support. Please note that the "group management" that  is referred 
here, is only being used to support account-namespace. This has nothing to do 
with exposing group-management as a feature to Ozone users. We do not have any 
plans to support "Groups" as a feature here.  How to isolate users into 
account-namespace is very much implementation dependent aspect. Different 
multi-tenant-authorizer plugins can implement it different ways. From high 
level design perspective, we are not suggesting how any specific implementation 
should implement the account-namespace enforcement. When we say that 
Ranger-plugin can use the group/role abstraction it already has, we are just 
giving an example of how easily a ranger based plugin can provide 
account-namespace Isolation. For a native Ozone plugin, it could be something 
as simple as keeping  a persistent mapping of user-names and account-namespace 
they belong to. 



>> One is the thing we mentioned in the call, namely how a user will access a 
>> bucket in its own tenant, which access points will be used?

 

All S3 users, regardless of their tenancy, would go to the same S3 access point.

 

>> How we will figure out which resource to serve?

 

The API specifies which specific resource is accessed.


>> The simple case when a user belongs to one tenant, and accesses a resource 
>> in its tenant I can imagine also in a multitude of

>> ways... but I can not come up with and have not seen an easy user friendly 
>> way to access tenant b's foo bucket (even if access is

>> granted for me) if I am in a tenant where there is a foo bucket as well. Are 
>> there any ideas on this already?

 

Buckets can be refered by "*tenant-name:bucketname"* syntax and can be accessed 
if by anyone who has authorization to access it. There are other ways to do it 
too. But for now, we plan to support "tenant-name:bucketname" convention. 

 

>> An other question is whether and how much we want to segregate tenants data, 
>> I guess volumes and buckets are already 

>> providing some segregation on the container/block level, but how we will 
>> enforce that access remains within boundaries when the 

>> user reads data from the DNs? Should we have concerns about accessing other 
>> tenants data via DN api's? How we can enforce 

>> access on that level? Marton Elek Do you have any idea if we potentially 
>> have to think about exploits here, or in the SCM layer, as 

>> we mostly talked about things that fell into OM territory.

 

We already should have enough security safeguards in DataNode to make sure one 
user can not access another user's data. I do not see why it should be any 
different with multi-tenancy support. I guess if there is any exploit that 
exists now, it would be a problem even without multi-tenancy.

 

>> The proposal does not mention, but I guess, if a system is configured with a 
>> certain mix of isolation styles, then reconfiguring it to an

>>other style is not possible, or is it? (I think of for example going from 
>>having just account namespace isolation to account and bucket namespace 
>>isolation, or vice versa for example)

 

Yes, this is correct. Once a style of multi-tenancy is chosen, it can not be 
changed.

 

>> In a multi-tenant word, why do we want to still have the S3V volume as 
>> something everyone has access to? If admins want to configure it that way, 
>> this seems to be ok, but if we provide that by default, then how users will 
>> tell if they want to use that? Should users be enabled to use that by def? 
>> What if we do not have anything there, or what if we want to migrate some 
>> stuff there when multi-tenancy is turned on?

 

It's for backward compatibility.  The default kerberos users with their S3 
identity can continue to access S3V volume. We can also use the syntax 
"tenancy:bucketname" mentioned above for accessing buckets in S3V.  (subject to 
authorization)

 

 

>> How we think about sub-groups within a tenant? What if later on we want to 
>> spin off a tenant from one of these sub-groups? How this possibility -if we 
>> want to provide a way - affects our current thinking?

 

 

Sub groups can always be implemented by defining another group. Spinning of a 
tenant from subgroup is an unknown requirement for now, but I guess if we 
wanted to do it, we could in future.

 

>> Are there any memorable, but loosely coupled ideas or thoughts that were 
>> coming up during design, that had an effect on certain

>> decisions that were made? (to add some color and more reasons on the 
>> selected way of thinking)

 

I believe, we made attempts to use whatever we already have in Ozone or use 
another existing project like Apache-Ranger, wherever possible to address the 
requirements.

 

> Multi-Tenant Support in Ozone
> -----------------------------
>
>                 Key: HDDS-4944
>                 URL: https://issues.apache.org/jira/browse/HDDS-4944
>             Project: Apache Ozone
>          Issue Type: New Feature
>          Components: Ozone CLI, Ozone Datanode, Ozone Manager, Ozone Recon, 
> S3, SCM, Security
>    Affects Versions: 1.2.0
>            Reporter: Prashant Pogde
>            Assignee: Prashant Pogde
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: Apache-S3-compatible-Multi-Tenant-Ozone-short.pdf.gz, 
> Ozone MultiTenant Feature _ Requirements and Abstractions-3.pdf, Ozone, 
> Multi-tenancy, S3, Kerberos....pdf, UseCaseAWSCompatibility.pdf, 
> UseCaseCephCompatibility.pdf, UseCaseConfigureMultiTenancy.png, 
> UseCaseCurrentOzoneS3BackwardCompatibility.pdf, 
> VariousActorsInteractions.png, uml_multitenant_interface_design.png
>
>
> This Jira will be used to track a new feature for Multi-Tenant support in 
> Ozone. Initially Multi-Tenant feature would be limited to ozone-users 
> accessing Ozone over S3 interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDDS-4944) Multi-Tenant Support in Ozone

Reply via email to