Re: Spark LOCAL mode and external jar (extraClassPath)

Jason Boorn Fri, 13 Apr 2018 09:53:53 -0700

Thanks - I’ve seen this SO post, it covers spark-submit, which I am not using.


Regarding the ALLUXIO_SPARK_CLIENT variable, it is located on the machine that 
is running the job which spawns the master=local spark.  According to the Spark 
documentation, this should be possible, but it appears it is not.

Once again - I’m trying to solve the use case for master=local, NOT for a 
cluster and NOT with spark-submit.  

> On Apr 13, 2018, at 12:47 PM, yohann jardin <yohannjar...@hotmail.com> wrote:
> 
> Hey Jason,
> Might be related to what is behind your variable ALLUXIO_SPARK_CLIENT and 
> where is located the lib (is it on HDFS, on the node that submits the job, or 
> locally to all spark workers?)
> There is a great post on SO about it: https://stackoverflow.com/a/37348234 
> <https://stackoverflow.com/a/37348234>
> We might as well check that you provide correctly the jar based on its 
> location. I have found it tricky in some cases.
> As a debug try, if the jar is not on HDFS, you can copy it there and then 
> specify the full path in the extraclasspath property. 
> Regards,
> Yohann Jardin
> 
> Le 4/13/2018 à 5:38 PM, Jason Boorn a écrit :
>> I do, and this is what I will fall back to if nobody has a better idea :)
>> 
>> I was just hoping to get this working as it is much more convenient for my 
>> testing pipeline.
>> 
>> Thanks again for the help
>> 
>>> On Apr 13, 2018, at 11:33 AM, Geoff Von Allmen <ge...@ibleducation.com 
>>> <mailto:ge...@ibleducation.com>> wrote:
>>> 
>>> Ok - `LOCAL` makes sense now.
>>> 
>>> Do you have the option to still use `spark-submit` in this scenario, but 
>>> using the following options:
>>> 
>>> ```bash
>>> --master "local[*]" \
>>> --deploy-mode "client" \
>>> ...
>>> ```
>>> 
>>> I know in the past, I have setup some options using `.config("Option", 
>>> "value")` when creating the spark session, and then other runtime options 
>>> as you describe above with `spark.conf.set`. At this point though I've just 
>>> moved everything out into a `spark-submit` script.
>>> 
>>> On Fri, Apr 13, 2018 at 8:18 AM, Jason Boorn <jbo...@gmail.com 
>>> <mailto:jbo...@gmail.com>> wrote:
>>> Hi Geoff -
>>> 
>>> Appreciate the help here - I do understand what you’re saying below.  And I 
>>> am able to get this working when I submit a job to a local cluster.
>>> 
>>> I think part of the issue here is that there’s ambiguity in the 
>>> terminology.  When I say “LOCAL” spark, I mean an instance of spark that is 
>>> created by my driver program, and is not a cluster itself.  It means that 
>>> my master node is “local”, and this mode is primarily used for testing.
>>> 
>>> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-local.html
>>>  
>>> <https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-local.html>
>>> 
>>> While I am able to get alluxio working with spark-submit, I am unable to 
>>> get it working when using local mode.  The mechanisms for setting class 
>>> paths during spark-submit are not available in local mode.  My 
>>> understanding is that all one is able to use is:
>>> 
>>> spark.conf.set(“”)
>>> 
>>> To set any runtime properties of the local instance.  Note that it is 
>>> possible (and I am more convinced of this as time goes on) that alluxio 
>>> simply does not work in spark local mode as described above.
>>> 
>>> 
>>>> On Apr 13, 2018, at 11:09 AM, Geoff Von Allmen <ge...@ibleducation.com 
>>>> <mailto:ge...@ibleducation.com>> wrote:
>>>> 
>>>> I fought with a 
>>>> ClassNotFoundException for quite some time, but it was for kafka.
>>>> 
>>>> The final configuration that got everything working was running 
>>>> spark-submit with the following options:
>>>> 
>>>> --jars "/path/to/.ivy2/jars/package.jar" \
>>>> --driver-class-path "/path/to/.ivy2/jars/package.jar" \
>>>> --conf "spark.executor.extraClassPath=/path/to/.ivy2/package.jar" \
>>>> --packages org.some.package:package_name:version
>>>> While this was needed for me to run in 
>>>> cluster mode, it works equally well for 
>>>> client mode as well.
>>>> 
>>>> One other note when needing to supplied multiple items to these args - 
>>>> --jars and 
>>>> --packages should be comma separated, 
>>>> --driver-class-path and 
>>>> extraClassPath should be 
>>>> : separated
>>>> 
>>>> HTH
>>>> 
>>>> 
>>>> On Fri, Apr 13, 2018 at 4:28 AM, jb44 <jbo...@gmail.com 
>>>> <mailto:jbo...@gmail.com>> wrote:
>>>> Haoyuan -
>>>> 
>>>> As I mentioned below, I've been through the documentation already.  It has
>>>> not helped me to resolve the issue.
>>>> 
>>>> Here is what I have tried so far:
>>>> 
>>>> - setting extraClassPath as explained below
>>>> - adding fs.alluxio.impl through sparkconf
>>>> - adding spark.sql.hive.metastore.sharedPrefixes (though I don't believe
>>>> this matters in my case)
>>>> - compiling the client from source 
>>>> 
>>>> Do you have any other suggestions on how to get this working?  
>>>> 
>>>> Thanks
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ 
>>>> <http://apache-spark-user-list.1001560.n3.nabble.com/>
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
>>>> <mailto:user-unsubscr...@spark.apache.org>
>>>> 
>>>> 
>>> 
>>> 
>> 
>

Re: Spark LOCAL mode and external jar (extraClassPath)

Reply via email to