couple things: 1) switch to IAM roles if at all possible - explicitly passing AWS credentials is a long and lonely road in the end
2) one really bad workaround/hack is to run a job that hits every worker and writes the credentials to the proper location (~/.awscredentials or whatever) ^^ i wouldn't recommend this. ^^ it's horrible and doesn't handle autoscaling, but i'm mentioning it anyway as it is a temporary fix. if you switch to IAM roles, things become a lot easier as you can authorize all of the EC2 instances in the cluster - and handles autoscaling very well - and at some point, you will want to autoscale. On Wed, Dec 30, 2015 at 1:08 PM, KOSTIANTYN Kudriavtsev < kudryavtsev.konstan...@gmail.com> wrote: > Chris, > > good question, as you can see from the code I set up them on driver, so I > expect they will be propagated to all nodes, won't them? > > Thank you, > Konstantin Kudryavtsev > > On Wed, Dec 30, 2015 at 1:06 PM, Chris Fregly <ch...@fregly.com> wrote: > >> are the credentials visible from each Worker node to all the Executor >> JVMs on each Worker? >> >> On Dec 30, 2015, at 12:45 PM, KOSTIANTYN Kudriavtsev < >> kudryavtsev.konstan...@gmail.com> wrote: >> >> Dear Spark community, >> >> I faced the following issue with trying accessing data on S3a, my code is >> the following: >> >> val sparkConf = new SparkConf() >> >> val sc = new SparkContext(sparkConf) >> sc.hadoopConfiguration.set("fs.s3a.impl", >> "org.apache.hadoop.fs.s3a.S3AFileSystem") >> sc.hadoopConfiguration.set("fs.s3a.access.key", "---") >> sc.hadoopConfiguration.set("fs.s3a.secret.key", "---") >> >> val sqlContext = SQLContext.getOrCreate(sc) >> >> val df = sqlContext.read.parquet(...) >> >> df.count >> >> >> It results in the following exception and log messages: >> >> 15/12/30 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to load >> credentials from BasicAWSCredentialsProvider: *Access key or secret key is >> null* >> 15/12/30 17:00:32 DEBUG EC2MetadataClient: Connecting to EC2 instance >> metadata service at URL: >> http://x.x.x.x/latest/meta-data/iam/security-credentials/ >> 15/12/30 <http://x.x.x.x/latest/meta-data/iam/security-credentials/15/12/30> >> 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to load credentials from >> InstanceProfileCredentialsProvider: The requested metadata is not found at >> http://x.x.x.x/latest/meta-data/iam/security-credentials/ >> 15/12/30 <http://x.x.x.x/latest/meta-data/iam/security-credentials/15/12/30> >> 17:00:32 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 3) >> com.amazonaws.AmazonClientException: Unable to load AWS credentials from any >> provider in the chain >> at >> com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117) >> at >> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521) >> at >> com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031) >> at >> com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994) >> at >> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297) >> >> >> I run standalone spark 1.5.2 and using hadoop 2.7.1 >> >> any ideas/workarounds? >> >> AWS credentials are correct for this bucket >> >> Thank you, >> Konstantin Kudryavtsev >> >> > -- *Chris Fregly* Principal Data Solutions Engineer IBM Spark Technology Center, San Francisco, CA http://spark.tc | http://advancedspark.com