Re: SparkSQL integration issue with AWS S3a

KOSTIANTYN Kudriavtsev Thu, 31 Dec 2015 07:21:06 -0800

Hi Jerry,

thanks for the hint, could you please more specific how can I pass
different spark-{usr}.conf per user during job submit and which propery I
can use to specify custom hdfs-site.xml? I tried to google, but didn't find
nothing


Thank you,
Konstantin Kudryavtsev

On Wed, Dec 30, 2015 at 2:37 PM, Jerry Lam <chiling...@gmail.com> wrote:

> Hi Kostiantyn,
>
> I want to confirm that it works first by using hdfs-site.xml. If yes, you
> could define different spark-{user-x}.conf and source them during
> spark-submit. let us know if hdfs-site.xml works first. It should.
>
> Best Regards,
>
> Jerry
>
> Sent from my iPhone
>
> On 30 Dec, 2015, at 2:31 pm, KOSTIANTYN Kudriavtsev <
> kudryavtsev.konstan...@gmail.com> wrote:
>
> Hi Jerry,
>
> I want to run different jobs on different S3 buckets - different AWS creds
> - on the same instances. Could you shed some light if it's possible to
> achieve with hdfs-site?
>
> Thank you,
> Konstantin Kudryavtsev
>
> On Wed, Dec 30, 2015 at 2:10 PM, Jerry Lam <chiling...@gmail.com> wrote:
>
>> Hi Kostiantyn,
>>
>> Can you define those properties in hdfs-site.xml and make sure it is
>> visible in the class path when you spark-submit? It looks like a conf
>> sourcing issue to me.
>>
>> Cheers,
>>
>> Sent from my iPhone
>>
>> On 30 Dec, 2015, at 1:59 pm, KOSTIANTYN Kudriavtsev <
>> kudryavtsev.konstan...@gmail.com> wrote:
>>
>> Chris,
>>
>> thanks for the hist with AIM roles, but in my case  I need to run
>> different jobs with different S3 permissions on the same cluster, so this
>> approach doesn't work for me as far as I understood it
>>
>> Thank you,
>> Konstantin Kudryavtsev
>>
>> On Wed, Dec 30, 2015 at 1:48 PM, Chris Fregly <ch...@fregly.com> wrote:
>>
>>> couple things:
>>>
>>> 1) switch to IAM roles if at all possible - explicitly passing AWS
>>> credentials is a long and lonely road in the end
>>>
>>> 2) one really bad workaround/hack is to run a job that hits every worker
>>> and writes the credentials to the proper location (~/.awscredentials or
>>> whatever)
>>>
>>> ^^ i wouldn't recommend this. ^^  it's horrible and doesn't handle
>>> autoscaling, but i'm mentioning it anyway as it is a temporary fix.
>>>
>>> if you switch to IAM roles, things become a lot easier as you can
>>> authorize all of the EC2 instances in the cluster - and handles autoscaling
>>> very well - and at some point, you will want to autoscale.
>>>
>>> On Wed, Dec 30, 2015 at 1:08 PM, KOSTIANTYN Kudriavtsev <
>>> kudryavtsev.konstan...@gmail.com> wrote:
>>>
>>>> Chris,
>>>>
>>>>  good question, as you can see from the code I set up them on driver,
>>>> so I expect they will be propagated to all nodes, won't them?
>>>>
>>>> Thank you,
>>>> Konstantin Kudryavtsev
>>>>
>>>> On Wed, Dec 30, 2015 at 1:06 PM, Chris Fregly <ch...@fregly.com> wrote:
>>>>
>>>>> are the credentials visible from each Worker node to all the Executor
>>>>> JVMs on each Worker?
>>>>>
>>>>> On Dec 30, 2015, at 12:45 PM, KOSTIANTYN Kudriavtsev <
>>>>> kudryavtsev.konstan...@gmail.com> wrote:
>>>>>
>>>>> Dear Spark community,
>>>>>
>>>>> I faced the following issue with trying accessing data on S3a, my code
>>>>> is the following:
>>>>>
>>>>> val sparkConf = new SparkConf()
>>>>>
>>>>> val sc = new SparkContext(sparkConf)
>>>>> sc.hadoopConfiguration.set("fs.s3a.impl", 
>>>>> "org.apache.hadoop.fs.s3a.S3AFileSystem")
>>>>> sc.hadoopConfiguration.set("fs.s3a.access.key", "---")
>>>>> sc.hadoopConfiguration.set("fs.s3a.secret.key", "---")
>>>>>
>>>>> val sqlContext = SQLContext.getOrCreate(sc)
>>>>>
>>>>> val df = sqlContext.read.parquet(...)
>>>>>
>>>>> df.count
>>>>>
>>>>>
>>>>> It results in the following exception and log messages:
>>>>>
>>>>> 15/12/30 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to load 
>>>>> credentials from BasicAWSCredentialsProvider: *Access key or secret key 
>>>>> is null*
>>>>> 15/12/30 17:00:32 DEBUG EC2MetadataClient: Connecting to EC2 instance 
>>>>> metadata service at URL: 
>>>>> http://x.x.x.x/latest/meta-data/iam/security-credentials/
>>>>> 15/12/30 
>>>>> <http://x.x.x.x/latest/meta-data/iam/security-credentials/15/12/30> 
>>>>> 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to load credentials 
>>>>> from InstanceProfileCredentialsProvider: The requested metadata is not 
>>>>> found at http://x.x.x.x/latest/meta-data/iam/security-credentials/
>>>>> 15/12/30 
>>>>> <http://x.x.x.x/latest/meta-data/iam/security-credentials/15/12/30> 
>>>>> 17:00:32 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 3)
>>>>> com.amazonaws.AmazonClientException: Unable to load AWS credentials from 
>>>>> any provider in the chain
>>>>>   at 
>>>>> com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117)
>>>>>   at 
>>>>> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521)
>>>>>   at 
>>>>> com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
>>>>>   at 
>>>>> com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
>>>>>   at 
>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297)
>>>>>
>>>>>
>>>>> I run standalone spark 1.5.2 and using hadoop 2.7.1
>>>>>
>>>>> any ideas/workarounds?
>>>>>
>>>>> AWS credentials are correct for this bucket
>>>>>
>>>>> Thank you,
>>>>> Konstantin Kudryavtsev
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> *Chris Fregly*
>>> Principal Data Solutions Engineer
>>> IBM Spark Technology Center, San Francisco, CA
>>> http://spark.tc | http://advancedspark.com
>>>
>>
>>
>

Re: SparkSQL integration issue with AWS S3a

Reply via email to