thanks Jerry, it works! really appreciate your help Thank you, Konstantin Kudryavtsev
On Fri, Jan 1, 2016 at 4:35 PM, Jerry Lam <chiling...@gmail.com> wrote: > Hi Kostiantyn, > > You should be able to use spark.conf to specify s3a keys. > > I don't remember exactly but you can add hadoop properties by prefixing > spark.hadoop.* > * is the s3a properties. For instance, > > spark.hadoop.s3a.access.key wudjgdueyhsj > > Of course, you need to make sure the property key is right. I'm using my > phone so I cannot easily verifying. > > Then you can specify different user using different spark.conf via > --properties-file when spark-submit > > HTH, > > Jerry > > Sent from my iPhone > > On 31 Dec, 2015, at 2:06 pm, KOSTIANTYN Kudriavtsev < > kudryavtsev.konstan...@gmail.com> wrote: > > Hi Jerry, > > what you suggested looks to be working (I put hdfs-site.xml into > $SPARK_HOME/conf folder), but could you shed some light on how it can be > federated per user? > Thanks in advance! > > Thank you, > Konstantin Kudryavtsev > > On Wed, Dec 30, 2015 at 2:37 PM, Jerry Lam <chiling...@gmail.com> wrote: > >> Hi Kostiantyn, >> >> I want to confirm that it works first by using hdfs-site.xml. If yes, you >> could define different spark-{user-x}.conf and source them during >> spark-submit. let us know if hdfs-site.xml works first. It should. >> >> Best Regards, >> >> Jerry >> >> Sent from my iPhone >> >> On 30 Dec, 2015, at 2:31 pm, KOSTIANTYN Kudriavtsev < >> kudryavtsev.konstan...@gmail.com> wrote: >> >> Hi Jerry, >> >> I want to run different jobs on different S3 buckets - different AWS >> creds - on the same instances. Could you shed some light if it's possible >> to achieve with hdfs-site? >> >> Thank you, >> Konstantin Kudryavtsev >> >> On Wed, Dec 30, 2015 at 2:10 PM, Jerry Lam <chiling...@gmail.com> wrote: >> >>> Hi Kostiantyn, >>> >>> Can you define those properties in hdfs-site.xml and make sure it is >>> visible in the class path when you spark-submit? It looks like a conf >>> sourcing issue to me. >>> >>> Cheers, >>> >>> Sent from my iPhone >>> >>> On 30 Dec, 2015, at 1:59 pm, KOSTIANTYN Kudriavtsev < >>> kudryavtsev.konstan...@gmail.com> wrote: >>> >>> Chris, >>> >>> thanks for the hist with AIM roles, but in my case I need to run >>> different jobs with different S3 permissions on the same cluster, so this >>> approach doesn't work for me as far as I understood it >>> >>> Thank you, >>> Konstantin Kudryavtsev >>> >>> On Wed, Dec 30, 2015 at 1:48 PM, Chris Fregly <ch...@fregly.com> wrote: >>> >>>> couple things: >>>> >>>> 1) switch to IAM roles if at all possible - explicitly passing AWS >>>> credentials is a long and lonely road in the end >>>> >>>> 2) one really bad workaround/hack is to run a job that hits every >>>> worker and writes the credentials to the proper location (~/.awscredentials >>>> or whatever) >>>> >>>> ^^ i wouldn't recommend this. ^^ it's horrible and doesn't handle >>>> autoscaling, but i'm mentioning it anyway as it is a temporary fix. >>>> >>>> if you switch to IAM roles, things become a lot easier as you can >>>> authorize all of the EC2 instances in the cluster - and handles autoscaling >>>> very well - and at some point, you will want to autoscale. >>>> >>>> On Wed, Dec 30, 2015 at 1:08 PM, KOSTIANTYN Kudriavtsev < >>>> kudryavtsev.konstan...@gmail.com> wrote: >>>> >>>>> Chris, >>>>> >>>>> good question, as you can see from the code I set up them on driver, >>>>> so I expect they will be propagated to all nodes, won't them? >>>>> >>>>> Thank you, >>>>> Konstantin Kudryavtsev >>>>> >>>>> On Wed, Dec 30, 2015 at 1:06 PM, Chris Fregly <ch...@fregly.com> >>>>> wrote: >>>>> >>>>>> are the credentials visible from each Worker node to all the Executor >>>>>> JVMs on each Worker? >>>>>> >>>>>> On Dec 30, 2015, at 12:45 PM, KOSTIANTYN Kudriavtsev < >>>>>> kudryavtsev.konstan...@gmail.com> wrote: >>>>>> >>>>>> Dear Spark community, >>>>>> >>>>>> I faced the following issue with trying accessing data on S3a, my >>>>>> code is the following: >>>>>> >>>>>> val sparkConf = new SparkConf() >>>>>> >>>>>> val sc = new SparkContext(sparkConf) >>>>>> sc.hadoopConfiguration.set("fs.s3a.impl", >>>>>> "org.apache.hadoop.fs.s3a.S3AFileSystem") >>>>>> sc.hadoopConfiguration.set("fs.s3a.access.key", "---") >>>>>> sc.hadoopConfiguration.set("fs.s3a.secret.key", "---") >>>>>> >>>>>> val sqlContext = SQLContext.getOrCreate(sc) >>>>>> >>>>>> val df = sqlContext.read.parquet(...) >>>>>> >>>>>> df.count >>>>>> >>>>>> >>>>>> It results in the following exception and log messages: >>>>>> >>>>>> 15/12/30 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to load >>>>>> credentials from BasicAWSCredentialsProvider: *Access key or secret key >>>>>> is null* >>>>>> 15/12/30 17:00:32 DEBUG EC2MetadataClient: Connecting to EC2 instance >>>>>> metadata service at URL: >>>>>> http://x.x.x.x/latest/meta-data/iam/security-credentials/ >>>>>> 15/12/30 >>>>>> <http://x.x.x.x/latest/meta-data/iam/security-credentials/15/12/30> >>>>>> 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to load credentials >>>>>> from InstanceProfileCredentialsProvider: The requested metadata is not >>>>>> found at http://x.x.x.x/latest/meta-data/iam/security-credentials/ >>>>>> 15/12/30 >>>>>> <http://x.x.x.x/latest/meta-data/iam/security-credentials/15/12/30> >>>>>> 17:00:32 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 3) >>>>>> com.amazonaws.AmazonClientException: Unable to load AWS credentials from >>>>>> any provider in the chain >>>>>> at >>>>>> com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117) >>>>>> at >>>>>> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521) >>>>>> at >>>>>> com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031) >>>>>> at >>>>>> com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994) >>>>>> at >>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297) >>>>>> >>>>>> >>>>>> I run standalone spark 1.5.2 and using hadoop 2.7.1 >>>>>> >>>>>> any ideas/workarounds? >>>>>> >>>>>> AWS credentials are correct for this bucket >>>>>> >>>>>> Thank you, >>>>>> Konstantin Kudryavtsev >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> *Chris Fregly* >>>> Principal Data Solutions Engineer >>>> IBM Spark Technology Center, San Francisco, CA >>>> http://spark.tc | http://advancedspark.com >>>> >>> >>> >> >