Re: SparkSQL integration issue with AWS S3a

KOSTIANTYN Kudriavtsev Wed, 30 Dec 2015 11:01:10 -0800

Chris,

thanks for the hist with AIM roles, but in my case  I need to run different
jobs with different S3 permissions on the same cluster, so this approach
doesn't work for me as far as I understood it


Thank you,
Konstantin Kudryavtsev

On Wed, Dec 30, 2015 at 1:48 PM, Chris Fregly <ch...@fregly.com> wrote:

> couple things:
>
> 1) switch to IAM roles if at all possible - explicitly passing AWS
> credentials is a long and lonely road in the end
>
> 2) one really bad workaround/hack is to run a job that hits every worker
> and writes the credentials to the proper location (~/.awscredentials or
> whatever)
>
> ^^ i wouldn't recommend this. ^^  it's horrible and doesn't handle
> autoscaling, but i'm mentioning it anyway as it is a temporary fix.
>
> if you switch to IAM roles, things become a lot easier as you can
> authorize all of the EC2 instances in the cluster - and handles autoscaling
> very well - and at some point, you will want to autoscale.
>
> On Wed, Dec 30, 2015 at 1:08 PM, KOSTIANTYN Kudriavtsev <
> kudryavtsev.konstan...@gmail.com> wrote:
>
>> Chris,
>>
>>  good question, as you can see from the code I set up them on driver, so
>> I expect they will be propagated to all nodes, won't them?
>>
>> Thank you,
>> Konstantin Kudryavtsev
>>
>> On Wed, Dec 30, 2015 at 1:06 PM, Chris Fregly <ch...@fregly.com> wrote:
>>
>>> are the credentials visible from each Worker node to all the Executor
>>> JVMs on each Worker?
>>>
>>> On Dec 30, 2015, at 12:45 PM, KOSTIANTYN Kudriavtsev <
>>> kudryavtsev.konstan...@gmail.com> wrote:
>>>
>>> Dear Spark community,
>>>
>>> I faced the following issue with trying accessing data on S3a, my code
>>> is the following:
>>>
>>> val sparkConf = new SparkConf()
>>>
>>> val sc = new SparkContext(sparkConf)
>>> sc.hadoopConfiguration.set("fs.s3a.impl", 
>>> "org.apache.hadoop.fs.s3a.S3AFileSystem")
>>> sc.hadoopConfiguration.set("fs.s3a.access.key", "---")
>>> sc.hadoopConfiguration.set("fs.s3a.secret.key", "---")
>>>
>>> val sqlContext = SQLContext.getOrCreate(sc)
>>>
>>> val df = sqlContext.read.parquet(...)
>>>
>>> df.count
>>>
>>>
>>> It results in the following exception and log messages:
>>>
>>> 15/12/30 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to load 
>>> credentials from BasicAWSCredentialsProvider: *Access key or secret key is 
>>> null*
>>> 15/12/30 17:00:32 DEBUG EC2MetadataClient: Connecting to EC2 instance 
>>> metadata service at URL: 
>>> http://x.x.x.x/latest/meta-data/iam/security-credentials/
>>> 15/12/30 
>>> <http://x.x.x.x/latest/meta-data/iam/security-credentials/15/12/30> 
>>> 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to load credentials from 
>>> InstanceProfileCredentialsProvider: The requested metadata is not found at 
>>> http://x.x.x.x/latest/meta-data/iam/security-credentials/
>>> 15/12/30 
>>> <http://x.x.x.x/latest/meta-data/iam/security-credentials/15/12/30> 
>>> 17:00:32 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 3)
>>> com.amazonaws.AmazonClientException: Unable to load AWS credentials from 
>>> any provider in the chain
>>>     at 
>>> com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117)
>>>     at 
>>> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521)
>>>     at 
>>> com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
>>>     at 
>>> com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
>>>     at 
>>> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297)
>>>
>>>
>>> I run standalone spark 1.5.2 and using hadoop 2.7.1
>>>
>>> any ideas/workarounds?
>>>
>>> AWS credentials are correct for this bucket
>>>
>>> Thank you,
>>> Konstantin Kudryavtsev
>>>
>>>
>>
>
>
> --
>
> *Chris Fregly*
> Principal Data Solutions Engineer
> IBM Spark Technology Center, San Francisco, CA
> http://spark.tc | http://advancedspark.com
>

Re: SparkSQL integration issue with AWS S3a

Reply via email to