I have to strongly disagree with making UGI.doAs() private. Just because you feel that impersonation isn't an important feature, does not make it so for all users. There are many valid use cases which require impersonation, and in fact I consider this to be one of the differentiating features of the Hadoop ecosystem. We make use of it heavily to build a variety of services which would not be possible without this. Also consider that in addition to gateway services such as Knox being broken by this change, you would also cripple job schedulers such as Oozie. Running workloads on YARN as different users is vital to ensure that queue resources are allocated and accounted for properly as well as file permissions enforced. Without impersonation, all users of a cluster would need to be granted access to talk directly to YARN. Higher level access points or APIs would not be possible.
Craig Condit ________________________________ From: Eric Yang <eric...@gmail.com> Sent: Wednesday, May 20, 2020 1:57 PM To: Akira Ajisaka <aajis...@apache.org> Cc: Hadoop Common <common-dev@hadoop.apache.org> Subject: [EXTERNAL] Re: [DISCUSS] Secure Hadoop without Kerberos Hi Akira, Thank you for the information. Knox plays a main role in reverse proxy for Hadoop cluster. I understand the importance to keep Knox running to centralize audit log for ingress into the cluster. Other reverse proxy solution like Nginx are more feature rich for caching static contents and load balancer. It would be great to have ability to use either Knox or Nginx as reverse proxy solution. Company wide OIDC is likely to run independently from Hadoop cluster, but also possible to run in a Hadoop cluster. Reverse proxy must have ability to redirects to OIDC where exposed endpoint is appropriate. HADOOP-11717 was a good effort to enable SSO integration except it is written to extend on Kerberos authentication, which prevents decoupling from Kerberos a reality. I gathered a few design requirements this morning, and welcome to contribute: 1. Encryption is mandatory. Server certificate validation is required. 2. Existing token infrastructure for block access token remains the same. 3. Replace delegation token transport with OIDC JWT token. 4. Patch token renewer logic to support renew token with OIDC endpoint before token expires. 5. Impersonation logic uses service user credentials. New way to renew service user credentials securely. 6. Replace Hadoop RPC SASL transport with TLS because OIDC works with TLS natively. 7. Command CLI improvements to use environment variables or files for accessing client credentials Downgrade the use of UGI.doAs() to private of Hadoop. Service should not run with elevated privileges unless there is a good reason for it (i.e. loading hive external tables). I think this is good starting point, and feedback can help to turn these requirements into tasks. Let me know what you think. Thanks regards, Eric On Tue, May 19, 2020 at 9:47 PM Akira Ajisaka <aajis...@apache.org> wrote: > Hi Eric, thank you for starting the discussion. > > I'm interested in OpenID Connect (OIDC) integration. > > In addition to the benefits (security, cloud native), operating costs may > be reduced in some companies. > We have our company-wide OIDC provider and enable SSO for Hadoop Web UIs > via Knox + OIDC in Yahoo! JAPAN. > On the other hand, Hadoop administrators have to manage our own KDC > servers only for Hadoop ecosystems. > If Hadoop and its ecosystem can support OIDC, we don't have to manage KDC > and that way operating costs will be reduced. > > Regards, > Akira > > On Thu, May 7, 2020 at 7:32 AM Eric Yang <eric...@gmail.com> wrote: > >> Hi all, >> >> Kerberos was developed decade before web development becomes popular. >> There are some Kerberos limitations which does not work well in Hadoop. A >> few examples of corner cases: >> >> 1. Kerberos principal doesn't encode port number, it is difficult to know >> if the principal is coming from an authorized daemon or a hacker container >> trying to forge service principal. >> 2. Hadoop Kerberos principals are used as high privileged principal, a >> form >> of credential to impersonate end user. >> 3. Delegation token may allow expired users to continue to run jobs long >> after they are gone, without rechecking if end user credentials is still >> valid. >> 4. Passing different form of tokens does not work well with cloud >> provider >> security mechanism. For example, passing AWS sts token for S3 bucket. >> There is no renewal mechanism, nor good way to identify when the token >> would expire. >> >> There are companies that work on bridging security mechanism of different >> types, but this is not primary goal for Hadoop. Hadoop can benefit from >> modernized security using open standards like OpenID Connect, which >> proposes to unify web applications using SSO. This ensure the client >> credentials are transported in each stage of client servers interaction. >> This may improve overall security, and provide more cloud native form >> factor. I wonder if there is any interested in the community to enable >> Hadoop OpenID Connect integration work? >> >> regards, >> Eric >> >