Hi Kostiantyn, Yes. If security is a concern then this approach cannot satisfy it. The keys are visible in the properties files. If the goal is to hide them, you might be able go a bit further with this approach. Have you look at spark security page?
Best Regards, Jerry Sent from my iPhone > On 6 Jan, 2016, at 8:49 am, Kostiantyn Kudriavtsev > <kudryavtsev.konstan...@gmail.com> wrote: > > Hi guys, > > the only one big issue with this approach: >>> spark.hadoop.s3a.access.key is now visible everywhere, in logs, in spark >>> webui and is not secured at all... > >> On Jan 2, 2016, at 11:13 AM, KOSTIANTYN Kudriavtsev >> <kudryavtsev.konstan...@gmail.com> wrote: >> >> thanks Jerry, it works! >> really appreciate your help >> >> Thank you, >> Konstantin Kudryavtsev >> >>> On Fri, Jan 1, 2016 at 4:35 PM, Jerry Lam <chiling...@gmail.com> wrote: >>> Hi Kostiantyn, >>> >>> You should be able to use spark.conf to specify s3a keys. >>> >>> I don't remember exactly but you can add hadoop properties by prefixing >>> spark.hadoop.* >>> * is the s3a properties. For instance, >>> >>> spark.hadoop.s3a.access.key wudjgdueyhsj >>> >>> Of course, you need to make sure the property key is right. I'm using my >>> phone so I cannot easily verifying. >>> >>> Then you can specify different user using different spark.conf via >>> --properties-file when spark-submit >>> >>> HTH, >>> >>> Jerry >>> >>> Sent from my iPhone >>> >>>> On 31 Dec, 2015, at 2:06 pm, KOSTIANTYN Kudriavtsev >>>> <kudryavtsev.konstan...@gmail.com> wrote: >>>> >>>> Hi Jerry, >>>> >>>> what you suggested looks to be working (I put hdfs-site.xml into >>>> $SPARK_HOME/conf folder), but could you shed some light on how it can be >>>> federated per user? >>>> Thanks in advance! >>>> >>>> Thank you, >>>> Konstantin Kudryavtsev >>>> >>>>> On Wed, Dec 30, 2015 at 2:37 PM, Jerry Lam <chiling...@gmail.com> wrote: >>>>> Hi Kostiantyn, >>>>> >>>>> I want to confirm that it works first by using hdfs-site.xml. If yes, you >>>>> could define different spark-{user-x}.conf and source them during >>>>> spark-submit. let us know if hdfs-site.xml works first. It should. >>>>> >>>>> Best Regards, >>>>> >>>>> Jerry >>>>> >>>>> Sent from my iPhone >>>>> >>>>>> On 30 Dec, 2015, at 2:31 pm, KOSTIANTYN Kudriavtsev >>>>>> <kudryavtsev.konstan...@gmail.com> wrote: >>>>>> >>>>>> Hi Jerry, >>>>>> >>>>>> I want to run different jobs on different S3 buckets - different AWS >>>>>> creds - on the same instances. Could you shed some light if it's >>>>>> possible to achieve with hdfs-site? >>>>>> >>>>>> Thank you, >>>>>> Konstantin Kudryavtsev >>>>>> >>>>>>> On Wed, Dec 30, 2015 at 2:10 PM, Jerry Lam <chiling...@gmail.com> wrote: >>>>>>> Hi Kostiantyn, >>>>>>> >>>>>>> Can you define those properties in hdfs-site.xml and make sure it is >>>>>>> visible in the class path when you spark-submit? It looks like a conf >>>>>>> sourcing issue to me. >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>>> On 30 Dec, 2015, at 1:59 pm, KOSTIANTYN Kudriavtsev >>>>>>>> <kudryavtsev.konstan...@gmail.com> wrote: >>>>>>>> >>>>>>>> Chris, >>>>>>>> >>>>>>>> thanks for the hist with AIM roles, but in my case I need to run >>>>>>>> different jobs with different S3 permissions on the same cluster, so >>>>>>>> this approach doesn't work for me as far as I understood it >>>>>>>> >>>>>>>> Thank you, >>>>>>>> Konstantin Kudryavtsev >>>>>>>> >>>>>>>>> On Wed, Dec 30, 2015 at 1:48 PM, Chris Fregly <ch...@fregly.com> >>>>>>>>> wrote: >>>>>>>>> couple things: >>>>>>>>> >>>>>>>>> 1) switch to IAM roles if at all possible - explicitly passing AWS >>>>>>>>> credentials is a long and lonely road in the end >>>>>>>>> >>>>>>>>> 2) one really bad workaround/hack is to run a job that hits every >>>>>>>>> worker and writes the credentials to the proper location >>>>>>>>> (~/.awscredentials or whatever) >>>>>>>>> >>>>>>>>> ^^ i wouldn't recommend this. ^^ it's horrible and doesn't handle >>>>>>>>> autoscaling, but i'm mentioning it anyway as it is a temporary fix. >>>>>>>>> >>>>>>>>> if you switch to IAM roles, things become a lot easier as you can >>>>>>>>> authorize all of the EC2 instances in the cluster - and handles >>>>>>>>> autoscaling very well - and at some point, you will want to autoscale. >>>>>>>>> >>>>>>>>>> On Wed, Dec 30, 2015 at 1:08 PM, KOSTIANTYN Kudriavtsev >>>>>>>>>> <kudryavtsev.konstan...@gmail.com> wrote: >>>>>>>>>> Chris, >>>>>>>>>> >>>>>>>>>> good question, as you can see from the code I set up them on >>>>>>>>>> driver, so I expect they will be propagated to all nodes, won't them? >>>>>>>>>> >>>>>>>>>> Thank you, >>>>>>>>>> Konstantin Kudryavtsev >>>>>>>>>> >>>>>>>>>>> On Wed, Dec 30, 2015 at 1:06 PM, Chris Fregly <ch...@fregly.com> >>>>>>>>>>> wrote: >>>>>>>>>>> are the credentials visible from each Worker node to all the >>>>>>>>>>> Executor JVMs on each Worker? >>>>>>>>>>> >>>>>>>>>>>> On Dec 30, 2015, at 12:45 PM, KOSTIANTYN Kudriavtsev >>>>>>>>>>>> <kudryavtsev.konstan...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Dear Spark community, >>>>>>>>>>>> >>>>>>>>>>>> I faced the following issue with trying accessing data on S3a, my >>>>>>>>>>>> code is the following: >>>>>>>>>>>> >>>>>>>>>>>> val sparkConf = new SparkConf() >>>>>>>>>>>> >>>>>>>>>>>> val sc = new SparkContext(sparkConf) >>>>>>>>>>>> sc.hadoopConfiguration.set("fs.s3a.impl", >>>>>>>>>>>> "org.apache.hadoop.fs.s3a.S3AFileSystem") >>>>>>>>>>>> sc.hadoopConfiguration.set("fs.s3a.access.key", "---") >>>>>>>>>>>> sc.hadoopConfiguration.set("fs.s3a.secret.key", "---") >>>>>>>>>>>> val sqlContext = SQLContext.getOrCreate(sc) >>>>>>>>>>>> val df = sqlContext.read.parquet(...) >>>>>>>>>>>> df.count >>>>>>>>>>>> >>>>>>>>>>>> It results in the following exception and log messages: >>>>>>>>>>>> 15/12/30 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to >>>>>>>>>>>> load credentials from BasicAWSCredentialsProvider: Access key or >>>>>>>>>>>> secret key is null >>>>>>>>>>>> 15/12/30 17:00:32 DEBUG EC2MetadataClient: Connecting to EC2 >>>>>>>>>>>> instance metadata service at URL: >>>>>>>>>>>> http://x.x.x.x/latest/meta-data/iam/security-credentials/ >>>>>>>>>>>> 15/12/30 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to >>>>>>>>>>>> load credentials from InstanceProfileCredentialsProvider: The >>>>>>>>>>>> requested metadata is not found at >>>>>>>>>>>> http://x.x.x.x/latest/meta-data/iam/security-credentials/ >>>>>>>>>>>> 15/12/30 17:00:32 ERROR Executor: Exception in task 1.0 in stage >>>>>>>>>>>> 1.0 (TID 3) >>>>>>>>>>>> com.amazonaws.AmazonClientException: Unable to load AWS >>>>>>>>>>>> credentials from any provider in the chain >>>>>>>>>>>> at >>>>>>>>>>>> com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117) >>>>>>>>>>>> at >>>>>>>>>>>> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521) >>>>>>>>>>>> at >>>>>>>>>>>> com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031) >>>>>>>>>>>> at >>>>>>>>>>>> com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297) >>>>>>>>>>>> >>>>>>>>>>>> I run standalone spark 1.5.2 and using hadoop 2.7.1 >>>>>>>>>>>> >>>>>>>>>>>> any ideas/workarounds? >>>>>>>>>>>> >>>>>>>>>>>> AWS credentials are correct for this bucket >>>>>>>>>>>> >>>>>>>>>>>> Thank you, >>>>>>>>>>>> Konstantin Kudryavtsev >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> Chris Fregly >>>>>>>>> Principal Data Solutions Engineer >>>>>>>>> IBM Spark Technology Center, San Francisco, CA >>>>>>>>> http://spark.tc | http://advancedspark.com >