I've found a workaround. I set up an http server serving the jar, and pointed 
to the http url in spark submit.
Which brings me to ask.... would it be a good option to allow spark-submit to 
upload a local jar to the master, which the master can then serve via an http 
interface? The master already runs a web UI, so I imagine we could allow it to 
receive jars, and serve them as well. Perhaps an additional flag could be used 
to signify that the local jar should be uploaded in this manner? I'd be happy 
to take a stab at it...but thoughts?
-Ashic.

From: as...@live.com
To: lohith.sam...@mphasis.com; user@spark.apache.org
Subject: RE: Cluster mode deployment from jar in S3
Date: Mon, 4 Jul 2016 11:30:31 +0100




Hi Lohith,Thanks for the response.
The S3 bucket does have access restrictions, but the instances in which the 
Spark master and workers run have an IAM role policy that allows them access to 
it. As such, we don't really configure the cli with credentials...the IAM roles 
take care of that. Is there a way to make Spark work the same way? Or should I 
get temporary credentials somehow (like 
http://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_use-resources.html
 ), and use them to somehow submit the job? I guess I'll have to set it via 
environment variables; I can't put it in application code, as the issue is in 
downloading the jar from S3.
-Ashic.

From: lohith.sam...@mphasis.com
To: as...@live.com; user@spark.apache.org
Subject: RE: Cluster mode deployment from jar in S3
Date: Mon, 4 Jul 2016 09:50:50 +0000









Hi,
                The
aws CLI already has your access key aid and secret access key when you 
initially configured it.
                Is your s3 bucket without any access restrictions?
 
 

Best regards / Mit freundlichen Grüßen / Sincères salutations
M. Lohith Samaga
 

 


From: Ashic Mahtab [mailto:as...@live.com]


Sent: Monday, July 04, 2016 15.06

To: Apache Spark

Subject: RE: Cluster mode deployment from jar in S3


 

Sorry to do this...but... *bump*

 






From:
as...@live.com

To: user@spark.apache.org

Subject: Cluster mode deployment from jar in S3

Date: Fri, 1 Jul 2016 17:45:12 +0100

Hello,

I've got a Spark stand-alone cluster using EC2 instances. I can submit jobs 
using "--deploy-mode client", however using "--deploy-mode cluster" is proving 
to be a challenge. I've tries this:


 


spark-submit --class foo --master spark:://master-ip:7077 --deploy-mode cluster 
s3://bucket/dir/foo.jar


 


When I do this, I get:



16/07/01 16:23:16 ERROR ClientEndpoint: Exception from cluster was: 
java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key 
must be specified as the username or password
 (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or 
fs.s3.awsSecretAccessKey properties (respectively).


java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key 
must be specified as the username or password (respectively) of a s3 URL, or by 
setting the fs.s3.awsAccessKeyId
 or fs.s3.awsSecretAccessKey properties (respectively).


        at 
org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66)


        at 
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.initialize(Jets3tFileSystemStore.java:82)


        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)


        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)


        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)


        at java.lang.reflect.Method.invoke(Method.java:498)


        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)


        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)



 


 


Now I'm not using any S3 or hadoop stuff within my code (it's just an 
sc.parallelize(1 to 100)). So, I imagine it's the driver trying to fetch the 
jar. I haven't set the AWS Access Key Id and
 Secret as mentioned, but the role the machine's are in allow them to copy the 
jar. In other words, this works:


 


aws s3 cp s3://bucket/dir/foo.jar /tmp/foo.jar


 


I'm using Spark 1.6.2, and can't really think of what I can do so that I can 
submit the jar from s3 using cluster deploy mode. I've also tried simply 
downloading the jar onto a node, and spark-submitting
 that... that works in client mode, but I get a not found error when using 
cluster mode.


 


Any help will be appreciated.


 


Thanks,


Ashic.








Information transmitted by this e-mail is proprietary to Mphasis, its 
associated companies and/ or its customers and is intended 

for use only by the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential or 

exempt from disclosure under applicable law. If you are not the intended 
recipient or it appears that this mail has been forwarded 

to you without proper authority, you are notified that any use or dissemination 
of this information in any manner is strictly 

prohibited. In such cases, please notify us immediately at 
mailmas...@mphasis.com and delete this mail from your records.
                                                                                
  

Reply via email to