I have to strongly disagree with making UGI.doAs() private. Just because you 
feel that impersonation isn't an important feature, does not make it so for all 
users. There are many valid use cases which require impersonation, and in fact 
I consider this to be one of the differentiating features of the Hadoop 
ecosystem. We make use of it heavily to build a variety of services which would 
not be possible without this. Also consider that in addition to gateway 
services such as Knox being broken by this change, you would also cripple job 
schedulers such as Oozie. Running workloads on YARN as different users is vital 
to ensure that queue resources are allocated and accounted for properly as well 
as file permissions enforced. Without impersonation, all users of a cluster 
would need to be granted access to talk directly to YARN. Higher level access 
points or APIs would not be possible.

Craig Condit

________________________________
From: Eric Yang <eric...@gmail.com>
Sent: Wednesday, May 20, 2020 1:57 PM
To: Akira Ajisaka <aajis...@apache.org>
Cc: Hadoop Common <common-dev@hadoop.apache.org>
Subject: [EXTERNAL] Re: [DISCUSS] Secure Hadoop without Kerberos

Hi Akira,

Thank you for the information.  Knox plays a main role in reverse proxy for
Hadoop cluster.  I understand the importance to keep Knox running to
centralize audit log for ingress into the cluster.  Other reverse proxy
solution like Nginx are more feature rich for caching static contents and
load balancer.  It would be great to have ability to use either Knox or
Nginx as reverse proxy solution.  Company wide OIDC is likely to run
independently from Hadoop cluster, but also possible to run in a Hadoop
cluster.  Reverse proxy must have ability to redirects to OIDC where
exposed endpoint is appropriate.

HADOOP-11717 was a good effort to enable SSO integration except it is
written to extend on Kerberos authentication, which prevents decoupling
from Kerberos a reality.  I gathered a few design requirements this
morning, and welcome to contribute:

1.  Encryption is mandatory.  Server certificate validation is required.
2.  Existing token infrastructure for block access token remains the same.
3.  Replace delegation token transport with OIDC JWT token.
4.  Patch token renewer logic to support renew token with OIDC endpoint
before token expires.
5.  Impersonation logic uses service user credentials.  New way to renew
service user credentials securely.
6.  Replace Hadoop RPC SASL transport with TLS because OIDC works with TLS
natively.
7.  Command CLI improvements to use environment variables or files for
accessing client credentials

Downgrade the use of UGI.doAs() to private of Hadoop.  Service should not
run with elevated privileges unless there is a good reason for it (i.e.
loading hive external tables).
I think this is good starting point, and feedback can help to turn these
requirements into tasks.  Let me know what you think.  Thanks

regards,
Eric

On Tue, May 19, 2020 at 9:47 PM Akira Ajisaka <aajis...@apache.org> wrote:

> Hi Eric, thank you for starting the discussion.
>
> I'm interested in OpenID Connect (OIDC) integration.
>
> In addition to the benefits (security, cloud native), operating costs may
> be reduced in some companies.
> We have our company-wide OIDC provider and enable SSO for Hadoop Web UIs
> via Knox + OIDC in Yahoo! JAPAN.
> On the other hand, Hadoop administrators have to manage our own KDC
> servers only for Hadoop ecosystems.
> If Hadoop and its ecosystem can support OIDC, we don't have to manage KDC
> and that way operating costs will be reduced.
>
> Regards,
> Akira
>
> On Thu, May 7, 2020 at 7:32 AM Eric Yang <eric...@gmail.com> wrote:
>
>> Hi all,
>>
>> Kerberos was developed decade before web development becomes popular.
>> There are some Kerberos limitations which does not work well in Hadoop.  A
>> few examples of corner cases:
>>
>> 1. Kerberos principal doesn't encode port number, it is difficult to know
>> if the principal is coming from an authorized daemon or a hacker container
>> trying to forge service principal.
>> 2. Hadoop Kerberos principals are used as high privileged principal, a
>> form
>> of credential to impersonate end user.
>> 3. Delegation token may allow expired users to continue to run jobs long
>> after they are gone, without rechecking if end user credentials is still
>> valid.
>> 4.  Passing different form of tokens does not work well with cloud
>> provider
>> security mechanism.  For example, passing AWS sts token for S3 bucket.
>> There is no renewal mechanism, nor good way to identify when the token
>> would expire.
>>
>> There are companies that work on bridging security mechanism of different
>> types, but this is not primary goal for Hadoop.  Hadoop can benefit from
>> modernized security using open standards like OpenID Connect, which
>> proposes to unify web applications using SSO.   This ensure the client
>> credentials are transported in each stage of client servers interaction.
>> This may improve overall security, and provide more cloud native form
>> factor.  I wonder if there is any interested in the community to enable
>> Hadoop OpenID Connect integration work?
>>
>> regards,
>> Eric
>>
>

Reply via email to