Re: SparkSQL integration issue with AWS S3a

Jerry Lam Wed, 30 Dec 2015 11:11:36 -0800

Hi Kostiantyn,

Can you define those properties in hdfs-site.xml and make sure it is visible in 
the class path when you spark-submit? It looks like a conf sourcing issue to 
me.


Cheers,

Sent from my iPhone

> On 30 Dec, 2015, at 1:59 pm, KOSTIANTYN Kudriavtsev 
> <kudryavtsev.konstan...@gmail.com> wrote:
> 
> Chris,
> 
> thanks for the hist with AIM roles, but in my case  I need to run different 
> jobs with different S3 permissions on the same cluster, so this approach 
> doesn't work for me as far as I understood it
> 
> Thank you,
> Konstantin Kudryavtsev
> 
>> On Wed, Dec 30, 2015 at 1:48 PM, Chris Fregly <ch...@fregly.com> wrote:
>> couple things:
>> 
>> 1) switch to IAM roles if at all possible - explicitly passing AWS 
>> credentials is a long and lonely road in the end
>> 
>> 2) one really bad workaround/hack is to run a job that hits every worker and 
>> writes the credentials to the proper location (~/.awscredentials or whatever)
>> 
>> ^^ i wouldn't recommend this. ^^  it's horrible and doesn't handle 
>> autoscaling, but i'm mentioning it anyway as it is a temporary fix.
>> 
>> if you switch to IAM roles, things become a lot easier as you can authorize 
>> all of the EC2 instances in the cluster - and handles autoscaling very well 
>> - and at some point, you will want to autoscale.
>> 
>>> On Wed, Dec 30, 2015 at 1:08 PM, KOSTIANTYN Kudriavtsev 
>>> <kudryavtsev.konstan...@gmail.com> wrote:
>>> Chris,
>>> 
>>>  good question, as you can see from the code I set up them on driver, so I 
>>> expect they will be propagated to all nodes, won't them?
>>> 
>>> Thank you,
>>> Konstantin Kudryavtsev
>>> 
>>>> On Wed, Dec 30, 2015 at 1:06 PM, Chris Fregly <ch...@fregly.com> wrote:
>>>> are the credentials visible from each Worker node to all the Executor JVMs 
>>>> on each Worker?
>>>> 
>>>>> On Dec 30, 2015, at 12:45 PM, KOSTIANTYN Kudriavtsev 
>>>>> <kudryavtsev.konstan...@gmail.com> wrote:
>>>>> 
>>>>> Dear Spark community,
>>>>> 
>>>>> I faced the following issue with trying accessing data on S3a, my code is 
>>>>> the following:
>>>>> 
>>>>> val sparkConf = new SparkConf()
>>>>> 
>>>>> val sc = new SparkContext(sparkConf)
>>>>> sc.hadoopConfiguration.set("fs.s3a.impl", 
>>>>> "org.apache.hadoop.fs.s3a.S3AFileSystem")
>>>>> sc.hadoopConfiguration.set("fs.s3a.access.key", "---")
>>>>> sc.hadoopConfiguration.set("fs.s3a.secret.key", "---")
>>>>> val sqlContext = SQLContext.getOrCreate(sc)
>>>>> val df = sqlContext.read.parquet(...)
>>>>> df.count
>>>>> 
>>>>> It results in the following exception and log messages:
>>>>> 15/12/30 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to load 
>>>>> credentials from BasicAWSCredentialsProvider: Access key or secret key is 
>>>>> null
>>>>> 15/12/30 17:00:32 DEBUG EC2MetadataClient: Connecting to EC2 instance 
>>>>> metadata service at URL: 
>>>>> http://x.x.x.x/latest/meta-data/iam/security-credentials/
>>>>> 15/12/30 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to load 
>>>>> credentials from InstanceProfileCredentialsProvider: The requested 
>>>>> metadata is not found at 
>>>>> http://x.x.x.x/latest/meta-data/iam/security-credentials/
>>>>> 15/12/30 17:00:32 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 
>>>>> 3)
>>>>> com.amazonaws.AmazonClientException: Unable to load AWS credentials from 
>>>>> any provider in the chain
>>>>>   at 
>>>>> com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117)
>>>>>   at 
>>>>> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521)
>>>>>   at 
>>>>> com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
>>>>>   at 
>>>>> com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
>>>>>   at 
>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297)
>>>>> 
>>>>> I run standalone spark 1.5.2 and using hadoop 2.7.1
>>>>> 
>>>>> any ideas/workarounds?
>>>>> 
>>>>> AWS credentials are correct for this bucket
>>>>> 
>>>>> Thank you,
>>>>> Konstantin Kudryavtsev
>> 
>> 
>> 
>> -- 
>> 
>> Chris Fregly
>> Principal Data Solutions Engineer
>> IBM Spark Technology Center, San Francisco, CA
>> http://spark.tc | http://advancedspark.com
>

Re: SparkSQL integration issue with AWS S3a

Reply via email to