Chris, thanks for the hist with AIM roles, but in my case I need to run different jobs with different S3 permissions on the same cluster, so this approach doesn't work for me as far as I understood it
Thank you, Konstantin Kudryavtsev On Wed, Dec 30, 2015 at 1:48 PM, Chris Fregly <ch...@fregly.com> wrote: > couple things: > > 1) switch to IAM roles if at all possible - explicitly passing AWS > credentials is a long and lonely road in the end > > 2) one really bad workaround/hack is to run a job that hits every worker > and writes the credentials to the proper location (~/.awscredentials or > whatever) > > ^^ i wouldn't recommend this. ^^ it's horrible and doesn't handle > autoscaling, but i'm mentioning it anyway as it is a temporary fix. > > if you switch to IAM roles, things become a lot easier as you can > authorize all of the EC2 instances in the cluster - and handles autoscaling > very well - and at some point, you will want to autoscale. > > On Wed, Dec 30, 2015 at 1:08 PM, KOSTIANTYN Kudriavtsev < > kudryavtsev.konstan...@gmail.com> wrote: > >> Chris, >> >> good question, as you can see from the code I set up them on driver, so >> I expect they will be propagated to all nodes, won't them? >> >> Thank you, >> Konstantin Kudryavtsev >> >> On Wed, Dec 30, 2015 at 1:06 PM, Chris Fregly <ch...@fregly.com> wrote: >> >>> are the credentials visible from each Worker node to all the Executor >>> JVMs on each Worker? >>> >>> On Dec 30, 2015, at 12:45 PM, KOSTIANTYN Kudriavtsev < >>> kudryavtsev.konstan...@gmail.com> wrote: >>> >>> Dear Spark community, >>> >>> I faced the following issue with trying accessing data on S3a, my code >>> is the following: >>> >>> val sparkConf = new SparkConf() >>> >>> val sc = new SparkContext(sparkConf) >>> sc.hadoopConfiguration.set("fs.s3a.impl", >>> "org.apache.hadoop.fs.s3a.S3AFileSystem") >>> sc.hadoopConfiguration.set("fs.s3a.access.key", "---") >>> sc.hadoopConfiguration.set("fs.s3a.secret.key", "---") >>> >>> val sqlContext = SQLContext.getOrCreate(sc) >>> >>> val df = sqlContext.read.parquet(...) >>> >>> df.count >>> >>> >>> It results in the following exception and log messages: >>> >>> 15/12/30 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to load >>> credentials from BasicAWSCredentialsProvider: *Access key or secret key is >>> null* >>> 15/12/30 17:00:32 DEBUG EC2MetadataClient: Connecting to EC2 instance >>> metadata service at URL: >>> http://x.x.x.x/latest/meta-data/iam/security-credentials/ >>> 15/12/30 >>> <http://x.x.x.x/latest/meta-data/iam/security-credentials/15/12/30> >>> 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to load credentials from >>> InstanceProfileCredentialsProvider: The requested metadata is not found at >>> http://x.x.x.x/latest/meta-data/iam/security-credentials/ >>> 15/12/30 >>> <http://x.x.x.x/latest/meta-data/iam/security-credentials/15/12/30> >>> 17:00:32 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 3) >>> com.amazonaws.AmazonClientException: Unable to load AWS credentials from >>> any provider in the chain >>> at >>> com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117) >>> at >>> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521) >>> at >>> com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031) >>> at >>> com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994) >>> at >>> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297) >>> >>> >>> I run standalone spark 1.5.2 and using hadoop 2.7.1 >>> >>> any ideas/workarounds? >>> >>> AWS credentials are correct for this bucket >>> >>> Thank you, >>> Konstantin Kudryavtsev >>> >>> >> > > > -- > > *Chris Fregly* > Principal Data Solutions Engineer > IBM Spark Technology Center, San Francisco, CA > http://spark.tc | http://advancedspark.com >