[
https://issues.apache.org/jira/browse/SPARK-19405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Burak Yavuz resolved SPARK-19405.
---------------------------------
Resolution: Fixed
Assignee: Adam Budde
Fix Version/s: 2.2.0
Resolved with: https://github.com/apache/spark/pull/16744
> Add support to KinesisUtils for cross-account Kinesis reads via STS
> -------------------------------------------------------------------
>
> Key: SPARK-19405
> URL: https://issues.apache.org/jira/browse/SPARK-19405
> Project: Spark
> Issue Type: Improvement
> Components: DStreams
> Reporter: Adam Budde
> Assignee: Adam Budde
> Priority: Minor
> Fix For: 2.2.0
>
>
> h1. Summary
> Enable KinesisReceiver to utilize STSAssumeRoleSessionCredentialsProvider
> when setting up the Kinesis Client Library in order to enable secure
> cross-account Kinesis stream reads managed by AWS Simple Token Service (STS)
> h1. Details
> Spark's KinesisReceiver implementation utilizes the Kinesis Client Library in
> order to allow users to write Spark Streaming jobs that operate on Kinesis
> data. The KCL uses a few AWS services under the hood in order to provide
> checkpointed, load-balanced processing of the underlying data in a Kinesis
> stream. Running the KCL requires permissions to be set up for the following
> AWS resources.
> * AWS Kinesis for reading stream data
> * AWS DynamoDB for storing KCL shared state in tables
> * AWS CloudWatch for logging KCL metrics
> The KinesisUtils.createStream() API allows users to authenticate to these
> services either by specifying an explicit AWS access key/secret key
> credential pair or by using the default credential provider chain. This
> supports authorizing to the three AWS services using either an AWS keypair
> (either provided explicitly or parsed from environment variables, etc.):
> !https://raw.githubusercontent.com/budde/budde_asf_jira_images/master/spark/kinesis_sts_support/KeypairOnly.png!
> Or the IAM instance profile (when running on EC2):
> !https://raw.githubusercontent.com/budde/budde_asf_jira_images/master/spark/kinesis_sts_support/InstanceProfileOnly.png!
> AWS users often need to access resources across separate accounts. This could
> be done in order to consume data produced by another organization or from a
> service running in another account for resource isolation purposes. AWS
> Simple Token Service (STS) provides a secure way to authorize cross-account
> resource access by using temporary sessions to assuming an IAM role in the
> AWS account with the resources being accessed.
> The [IAM
> documentation|http://docs.aws.amazon.com/IAM/latest/UserGuide/tutorial_cross-account-with-roles.html]
> covers the specifics of how cross account IAM role assumption works in much
> greater detail, but if an actor in account A wanted to read from a Kinesis
> stream in account B the general steps required would look something like this:
> * An IAM role is added to account B with read permissions for the Kinesis
> stream
> ** Trust policy is configured to allow account A to assume the role
> * Actor in account A uses its own long-lived credentials to tell STS to
> assume the role in account B
> * STS returns temporary credentials with permission to read from the stream
> in account B
> Applied to KinesisReceiver and the KCL, we could use a keypair as our
> long-lived credentials to authenticate to STS and assume an external role
> with the necessary KCL permissions:
> !https://raw.githubusercontent.com/budde/budde_asf_jira_images/master/spark/kinesis_sts_support/STSKeypair.png!
> Or the instance profile as long-lived credentials:
> !https://raw.githubusercontent.com/budde/budde_asf_jira_images/master/spark/kinesis_sts_support/STSInstanceProfile.png!
> The STSAssumeRoleSessionCredentialsProvider implementation of the
> AWSCredentialsProviderChain interface from the AWS SDK abstracts all of the
> management of the temporary session credentials away from the user.
> STSAssumeRoleSessionCredentialsProvider simply needs the ARN of the AWS role
> to be assumed, a session name for STS labeling purposes, an optional session
> external ID and long-lived credentials to use for authenticating with the STS
> service itself.
> Supporting cross-account Kinesis access via STS requires supplying the
> following additional configuration parameters:
> * ARN of IAM role to assume in external account
> * A name to apply to the STS session
> * (optional) An IAM external ID to validate the assumed role against
> The STSAssumeRoleSessionCredentialsProvider implementation of the
> AWSCredentialsProvider interface takes these parameters as input and
> abstracts away all of the lifecycle management for the temporary session
> credentials. Ideally, users could simply supply an AWSCredentialsProvider
> instance as an argument when creating the stream that would be distributed to
> the executors for use when setting up the KCL. Since these classes aren't
> serializable this will require an approach similar to
> SerializableAWSCredentials where the config parameters are passed via a
> serializable object and the correct AWSCredentialsProvider implementation is
> created on the executor from the params.
> Following the current conventions, adding optional arguments will mean having
> to double the number of overloaded implementations of
> KinesisUtils.createStream(). For this reason, we can make stsAssumeRoleArn,
> stsSessionName and stsExternalId each required parameters for STS
> authentication (external id is ignored if none is specified in the trust
> policy < link > of the assumed role).
> Here are the providers that should be used for authentication depending on
> the combination of AWS parameters provided:
> ||Input params||Kinesis credentials||Long-lived credentials||
> |(none)|Use long-lived|DefaultAWSCredentialsProviderChain|
> |awsAccessKeyId, awsSecretKey|Use long-lived|AWSCredentialsProvider w/keypair|
> |stsRoleArn, stsSessionName, stsExternalId
> (optional)|STSAssumeRoleSessionCredentialsProvider|DefaultAWSCredentialsProviderChain|
> |awsAccessKeyId, awsSecretKey, stsRoleArn, stsSessionName, stsExternalId
> (optional)|STSAssumeRoleSessionCredentialsProvider|AWSCredentialsProvider
> w/keypair|
> Since there's now a wide variety of combinations of optional parameters for
> KinesisUtils, I think a builder pattern may provide a more manageable
> interface for creating streams in both Scala and Java. This would also make
> it feasible to specify specific AWS config params for DynamoDB and
> CloudWatch, which is supported by the KCL. I may look into submitting an
> issue/PR for this as well.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]