[ 
https://issues.apache.org/jira/browse/SPARK-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192411#comment-14192411
 ] 

Chris Fregly commented on SPARK-3640:
-------------------------------------

Agreed that this was no ideal when i first chose this implementation.  And as 
you mentioned, the NotSerializableException is exactly why I went with the 
DefaultCredentialsProvider.

So I spent some time trying to solve this using AWS IAM Roles on separate users 
under your root AWS account.  This appears to work well with the existing 
DefaultCredentialsProvider.

Is this a viable option for you?  

Basically, every user would get their own ACCESS_KEY_ID and SECRET_KEY.  This 
would be used in place of the root credentials.

For thoroughness, I've included links to the instructions as well as an example 
IAM Policy JSON (I'll also add this to the Spark Kinesis Developer Guide 
(http://spark.apache.org/docs/latest/streaming-kinesis-integration.html):

Creating IAM users
        http://docs.aws.amazon.com/IAM/latest/UserGuide/Using_SettingUpUser.html
        https://console.aws.amazon.com/iam/home?#security_credential 

Setting up Kinesis, DynamoDB, and CloudWatch IAM Policy for the new users
        http://docs.aws.amazon.com/kinesis/latest/dev/kinesis-using-iam.html

IAM Policy Generator
        http://awspolicygen.s3.amazonaws.com/policygen.html

Attaching the Custom Policy 
        https://console.aws.amazon.com/iam/home?#users
        Select the user
        Select Attach Policy
        Select Custom Policy

IAM Policy JSON 
        This is already generated using the Policy Generator above... just fill 
in the missing pieces specific to your environment.
{
  "Statement": [
    {
      "Sid": "Stmt1414784467497",
      "Action": "kinesis:*",
      "Effect": "Allow",
      "Resource": 
"arn:aws:kinesis:<region-of-stream>:<aws-account-id>:stream/<stream-name>"
    },
    {
      "Sid": "Stmt1414784693732",
      "Action": "dynamodb:*",
      "Effect": "Allow",
      "Resource": 
"arn:aws:dynamodb:us-east-1:<aws-account-id>:table/<dynamodb-tablename>"
    },
    {
      "Sid": "Stmt1414785131046",
      "Action": "cloudwatch:*",
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}

Notes:
* The region of the DynamoDB table is intentionally hard-coded to us-east-1 as 
this is how Kinesis currently works
* The DynamoDB table is the same as the application name of the Kinesis 
Streaming Application.  The sample included with the Spark distribution uses 
KinesisWordCount for the application/table name.


Is this a sufficient workaround.  Using IAM Policies is an AWS best practice, 
but not sure if this aligns with your existing environment.  If not, I can 
continue to investigate exposing that CredentialsProvider

Lemme know, Aniket!


> KinesisUtils should accept a credentials object instead of forcing 
> DefaultCredentialsProvider
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-3640
>                 URL: https://issues.apache.org/jira/browse/SPARK-3640
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.1.0
>            Reporter: Aniket Bhatnagar
>              Labels: kinesis
>
> KinesisUtils should accept AWS Credentials as a parameter and should default 
> to DefaultCredentialsProvider if no credentials are provided. Currently, the 
> implementation forces usage of DefaultCredentialsProvider which can be a pain 
> especially when jobs are run by multiple  unix users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to