Hi Steve,

Thank you for sharing the work done for Amazon STS token to work with s3a
connector.  This works for direct HDFS to S3 bucket interaction.  Your
statement is also spot on for containers running in YARN has no mechanism
to update the triple of session credentials.
If I am not mistaken, Amazon STS token is not renewable, and has a max life
time of 12 hours.  New token must be obtained for AWS role
for long running containers.  There are a number of ways to fix session
issues for YARN:

1.  RM keeps track of the session and login secrets, and inject STS token
into container running environment periodically.  (Nasty hack to modify
environment variable of a running process).
2.  Transport the client access key and secret key to container, and
container performs the re-login process.
3.  If user home directory contains ~/.aws/credentials on all nodes, this
works without code change, but operational nightmare.
4.  Streamline the token handling to use OIDC JWT token, and client
libraries will always perform check with OIDC server to keep token fresh.

Option 1-3 might work with existing s3a connector work with some
modification to application as well.  Number 4 is aimed to modify Hadoop
libraries that does authentication and token renewal transparently.  This
allows existing application to work by swapping jar files only without more
code modification.  It will also improve security because session
expiration is synchronized.  I am leaning toward address the
fundamental problem, and I know the community has spent years of
improvement to get to this point.  However, Hadoop needs a way forward.
This discussion helps to determine if it is essential to support OIDC as
alternate security mechanism.  How to do it using existing code, and how
not to break existing code.

regards,
Eric

On Thu, May 21, 2020 at 9:22 AM Steve Loughran <ste...@cloudera.com.invalid>
wrote:

> On Wed, 6 May 2020 at 23:32, Eric Yang <eric...@gmail.com> wrote:
>
> > Hi all,
> >
> >
> > 4.  Passing different form of tokens does not work well with cloud
> provider
> > security mechanism.  For example, passing AWS sts token for S3 bucket.
> > There is no renewal mechanism, nor good way to identify when the token
> > would expire.
> >
> >
> well, HADOOP-14556 does it fairly well, supporting session and role tokens.
> We even know when they expire because we ask for a duration when we request
> the session/role creds.
> See org.apache.hadoop.fs.s3a.auth.delegation.AbstractS3ATokenIdentifier for
> the core of what we marshall, including encryption secrets.
>
> The main issue there is that Yarn can't refresh those tokens because a new
> triple of session credentials are required; currently token renewal assumes
> the token is unchanged and a request is made to the service to update their
> table of issued tokens. But even if the RM could get back a new token from
> a refresh call, we are left with the problem of "how to get an updated set
> of creds to each process"
>

Reply via email to