Hi Kostiantyn,

Yes. If security is a concern then this approach cannot satisfy it. The keys 
are visible in the properties files. If the goal is to hide them, you might be 
able go a bit further with this approach. Have you look at spark security page?

Best Regards,


Sent from my iPhone

> On 6 Jan, 2016, at 8:49 am, Kostiantyn Kudriavtsev 
> <kudryavtsev.konstan...@gmail.com> wrote:
> Hi guys,
> the only one big issue with this approach:
>>> spark.hadoop.s3a.access.key  is now visible everywhere, in logs, in spark 
>>> webui and is not secured at all...
>> On Jan 2, 2016, at 11:13 AM, KOSTIANTYN Kudriavtsev 
>> <kudryavtsev.konstan...@gmail.com> wrote:
>> thanks Jerry, it works!
>> really appreciate your help 
>> Thank you,
>> Konstantin Kudryavtsev
>>> On Fri, Jan 1, 2016 at 4:35 PM, Jerry Lam <chiling...@gmail.com> wrote:
>>> Hi Kostiantyn,
>>> You should be able to use spark.conf to specify s3a keys.
>>> I don't remember exactly but you can add hadoop properties by prefixing 
>>> spark.hadoop.*
>>> * is the s3a properties. For instance,
>>> spark.hadoop.s3a.access.key wudjgdueyhsj
>>> Of course, you need to make sure the property key is right. I'm using my 
>>> phone so I cannot easily verifying.
>>> Then you can specify different user using different spark.conf via 
>>> --properties-file when spark-submit
>>> HTH,
>>> Jerry
>>> Sent from my iPhone
>>>> On 31 Dec, 2015, at 2:06 pm, KOSTIANTYN Kudriavtsev 
>>>> <kudryavtsev.konstan...@gmail.com> wrote:
>>>> Hi Jerry,
>>>> what you suggested looks to be working (I put hdfs-site.xml into 
>>>> $SPARK_HOME/conf folder), but could you shed some light on how it can be 
>>>> federated per user?
>>>> Thanks in advance!
>>>> Thank you,
>>>> Konstantin Kudryavtsev
>>>>> On Wed, Dec 30, 2015 at 2:37 PM, Jerry Lam <chiling...@gmail.com> wrote:
>>>>> Hi Kostiantyn,
>>>>> I want to confirm that it works first by using hdfs-site.xml. If yes, you 
>>>>> could define different spark-{user-x}.conf and source them during 
>>>>> spark-submit. let us know if hdfs-site.xml works first. It should.
>>>>> Best Regards,
>>>>> Jerry
>>>>> Sent from my iPhone
>>>>>> On 30 Dec, 2015, at 2:31 pm, KOSTIANTYN Kudriavtsev 
>>>>>> <kudryavtsev.konstan...@gmail.com> wrote:
>>>>>> Hi Jerry,
>>>>>> I want to run different jobs on different S3 buckets - different AWS 
>>>>>> creds - on the same instances. Could you shed some light if it's 
>>>>>> possible to achieve with hdfs-site?
>>>>>> Thank you,
>>>>>> Konstantin Kudryavtsev
>>>>>>> On Wed, Dec 30, 2015 at 2:10 PM, Jerry Lam <chiling...@gmail.com> wrote:
>>>>>>> Hi Kostiantyn,
>>>>>>> Can you define those properties in hdfs-site.xml and make sure it is 
>>>>>>> visible in the class path when you spark-submit? It looks like a conf 
>>>>>>> sourcing issue to me. 
>>>>>>> Cheers,
>>>>>>> Sent from my iPhone
>>>>>>>> On 30 Dec, 2015, at 1:59 pm, KOSTIANTYN Kudriavtsev 
>>>>>>>> <kudryavtsev.konstan...@gmail.com> wrote:
>>>>>>>> Chris,
>>>>>>>> thanks for the hist with AIM roles, but in my case  I need to run 
>>>>>>>> different jobs with different S3 permissions on the same cluster, so 
>>>>>>>> this approach doesn't work for me as far as I understood it
>>>>>>>> Thank you,
>>>>>>>> Konstantin Kudryavtsev
>>>>>>>>> On Wed, Dec 30, 2015 at 1:48 PM, Chris Fregly <ch...@fregly.com> 
>>>>>>>>> wrote:
>>>>>>>>> couple things:
>>>>>>>>> 1) switch to IAM roles if at all possible - explicitly passing AWS 
>>>>>>>>> credentials is a long and lonely road in the end
>>>>>>>>> 2) one really bad workaround/hack is to run a job that hits every 
>>>>>>>>> worker and writes the credentials to the proper location 
>>>>>>>>> (~/.awscredentials or whatever)
>>>>>>>>> ^^ i wouldn't recommend this. ^^  it's horrible and doesn't handle 
>>>>>>>>> autoscaling, but i'm mentioning it anyway as it is a temporary fix.
>>>>>>>>> if you switch to IAM roles, things become a lot easier as you can 
>>>>>>>>> authorize all of the EC2 instances in the cluster - and handles 
>>>>>>>>> autoscaling very well - and at some point, you will want to autoscale.
>>>>>>>>>> On Wed, Dec 30, 2015 at 1:08 PM, KOSTIANTYN Kudriavtsev 
>>>>>>>>>> <kudryavtsev.konstan...@gmail.com> wrote:
>>>>>>>>>> Chris,
>>>>>>>>>>  good question, as you can see from the code I set up them on 
>>>>>>>>>> driver, so I expect they will be propagated to all nodes, won't them?
>>>>>>>>>> Thank you,
>>>>>>>>>> Konstantin Kudryavtsev
>>>>>>>>>>> On Wed, Dec 30, 2015 at 1:06 PM, Chris Fregly <ch...@fregly.com> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> are the credentials visible from each Worker node to all the 
>>>>>>>>>>> Executor JVMs on each Worker?
>>>>>>>>>>>> On Dec 30, 2015, at 12:45 PM, KOSTIANTYN Kudriavtsev 
>>>>>>>>>>>> <kudryavtsev.konstan...@gmail.com> wrote:
>>>>>>>>>>>> Dear Spark community,
>>>>>>>>>>>> I faced the following issue with trying accessing data on S3a, my 
>>>>>>>>>>>> code is the following:
>>>>>>>>>>>> val sparkConf = new SparkConf()
>>>>>>>>>>>> val sc = new SparkContext(sparkConf)
>>>>>>>>>>>> sc.hadoopConfiguration.set("fs.s3a.impl", 
>>>>>>>>>>>> "org.apache.hadoop.fs.s3a.S3AFileSystem")
>>>>>>>>>>>> sc.hadoopConfiguration.set("fs.s3a.access.key", "---")
>>>>>>>>>>>> sc.hadoopConfiguration.set("fs.s3a.secret.key", "---")
>>>>>>>>>>>> val sqlContext = SQLContext.getOrCreate(sc)
>>>>>>>>>>>> val df = sqlContext.read.parquet(...)
>>>>>>>>>>>> df.count
>>>>>>>>>>>> It results in the following exception and log messages:
>>>>>>>>>>>> 15/12/30 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to 
>>>>>>>>>>>> load credentials from BasicAWSCredentialsProvider: Access key or 
>>>>>>>>>>>> secret key is null
>>>>>>>>>>>> 15/12/30 17:00:32 DEBUG EC2MetadataClient: Connecting to EC2 
>>>>>>>>>>>> instance metadata service at URL: 
>>>>>>>>>>>> http://x.x.x.x/latest/meta-data/iam/security-credentials/
>>>>>>>>>>>> 15/12/30 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to 
>>>>>>>>>>>> load credentials from InstanceProfileCredentialsProvider: The 
>>>>>>>>>>>> requested metadata is not found at 
>>>>>>>>>>>> http://x.x.x.x/latest/meta-data/iam/security-credentials/
>>>>>>>>>>>> 15/12/30 17:00:32 ERROR Executor: Exception in task 1.0 in stage 
>>>>>>>>>>>> 1.0 (TID 3)
>>>>>>>>>>>> com.amazonaws.AmazonClientException: Unable to load AWS 
>>>>>>>>>>>> credentials from any provider in the chain
>>>>>>>>>>>>    at 
>>>>>>>>>>>> com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117)
>>>>>>>>>>>>    at 
>>>>>>>>>>>> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521)
>>>>>>>>>>>>    at 
>>>>>>>>>>>> com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
>>>>>>>>>>>>    at 
>>>>>>>>>>>> com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
>>>>>>>>>>>>    at 
>>>>>>>>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297)
>>>>>>>>>>>> I run standalone spark 1.5.2 and using hadoop 2.7.1
>>>>>>>>>>>> any ideas/workarounds?
>>>>>>>>>>>> AWS credentials are correct for this bucket
>>>>>>>>>>>> Thank you,
>>>>>>>>>>>> Konstantin Kudryavtsev
>>>>>>>>> -- 
>>>>>>>>> Chris Fregly
>>>>>>>>> Principal Data Solutions Engineer
>>>>>>>>> IBM Spark Technology Center, San Francisco, CA
>>>>>>>>> http://spark.tc | http://advancedspark.com

Reply via email to