Re: SparkSQL integration issue with AWS S3a

Chris Fregly Wed, 30 Dec 2015 10:49:21 -0800

couple things:

1) switch to IAM roles if at all possible - explicitly passing AWS
credentials is a long and lonely road in the end


2) one really bad workaround/hack is to run a job that hits every worker
and writes the credentials to the proper location (~/.awscredentials or
whatever)

^^ i wouldn't recommend this. ^^  it's horrible and doesn't handle
autoscaling, but i'm mentioning it anyway as it is a temporary fix.

if you switch to IAM roles, things become a lot easier as you can authorize
all of the EC2 instances in the cluster - and handles autoscaling very well
- and at some point, you will want to autoscale.

On Wed, Dec 30, 2015 at 1:08 PM, KOSTIANTYN Kudriavtsev <
kudryavtsev.konstan...@gmail.com> wrote:

> Chris,
>
>  good question, as you can see from the code I set up them on driver, so I
> expect they will be propagated to all nodes, won't them?
>
> Thank you,
> Konstantin Kudryavtsev
>
> On Wed, Dec 30, 2015 at 1:06 PM, Chris Fregly <ch...@fregly.com> wrote:
>
>> are the credentials visible from each Worker node to all the Executor
>> JVMs on each Worker?
>>
>> On Dec 30, 2015, at 12:45 PM, KOSTIANTYN Kudriavtsev <
>> kudryavtsev.konstan...@gmail.com> wrote:
>>
>> Dear Spark community,
>>
>> I faced the following issue with trying accessing data on S3a, my code is
>> the following:
>>
>> val sparkConf = new SparkConf()
>>
>> val sc = new SparkContext(sparkConf)
>> sc.hadoopConfiguration.set("fs.s3a.impl", 
>> "org.apache.hadoop.fs.s3a.S3AFileSystem")
>> sc.hadoopConfiguration.set("fs.s3a.access.key", "---")
>> sc.hadoopConfiguration.set("fs.s3a.secret.key", "---")
>>
>> val sqlContext = SQLContext.getOrCreate(sc)
>>
>> val df = sqlContext.read.parquet(...)
>>
>> df.count
>>
>>
>> It results in the following exception and log messages:
>>
>> 15/12/30 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to load 
>> credentials from BasicAWSCredentialsProvider: *Access key or secret key is 
>> null*
>> 15/12/30 17:00:32 DEBUG EC2MetadataClient: Connecting to EC2 instance 
>> metadata service at URL: 
>> http://x.x.x.x/latest/meta-data/iam/security-credentials/
>> 15/12/30 <http://x.x.x.x/latest/meta-data/iam/security-credentials/15/12/30> 
>> 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to load credentials from 
>> InstanceProfileCredentialsProvider: The requested metadata is not found at 
>> http://x.x.x.x/latest/meta-data/iam/security-credentials/
>> 15/12/30 <http://x.x.x.x/latest/meta-data/iam/security-credentials/15/12/30> 
>> 17:00:32 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 3)
>> com.amazonaws.AmazonClientException: Unable to load AWS credentials from any 
>> provider in the chain
>>      at 
>> com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117)
>>      at 
>> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521)
>>      at 
>> com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
>>      at 
>> com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
>>      at 
>> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297)
>>
>>
>> I run standalone spark 1.5.2 and using hadoop 2.7.1
>>
>> any ideas/workarounds?
>>
>> AWS credentials are correct for this bucket
>>
>> Thank you,
>> Konstantin Kudryavtsev
>>
>>
>


-- 

*Chris Fregly*
Principal Data Solutions Engineer
IBM Spark Technology Center, San Francisco, CA
http://spark.tc | http://advancedspark.com

Re: SparkSQL integration issue with AWS S3a

Reply via email to