Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-14 Thread Gourav Sengupta
Hi, if you start spark or pyspark from command line and then add the option --jars and see that things are working fine, then it means that you will have to add the jar either to SPARK_HOME jars file or modify the spark-env file to include the path pointing to the location where the jar file is

Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-14 Thread Jason Boorn
Ok great I’ll give that a shot - Thanks for all the help > On Apr 14, 2018, at 12:08 PM, Gene Pang wrote: > > Yes, I think that is the case. I haven't tried that before, but it should > work. > > Thanks, > Gene > > On Fri, Apr 13, 2018 at 11:32 AM, Jason Boorn

Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-14 Thread Gene Pang
Yes, I think that is the case. I haven't tried that before, but it should work. Thanks, Gene On Fri, Apr 13, 2018 at 11:32 AM, Jason Boorn wrote: > Hi Gene - > > Are you saying that I just need to figure out how to get the Alluxio jar > into the classpath of my parent

Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread Jason Boorn
Hi Gene - Are you saying that I just need to figure out how to get the Alluxio jar into the classpath of my parent application? If it shows up in the classpath then Spark will automatically know that it needs to use it when communicating with Alluxio? Apologies for going back-and-forth on

Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread Gene Pang
Hi Jason, Alluxio does work with Spark in master=local mode. This is because both spark-submit and spark-shell have command-line options to set the classpath for the JVM that is being started. If you are not using spark-submit or spark-shell, you will have to figure out how to configure that JVM

Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread Jason Boorn
Ok thanks - I was basing my design on this: https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html Wherein it says: Once the SparkSession is instantiated, you can

Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread Marcelo Vanzin
There are two things you're doing wrong here: On Thu, Apr 12, 2018 at 6:32 PM, jb44 wrote: > Then I can add the alluxio client library like so: > sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT) First one, you can't modify JVM configuration after it

Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread Jason Boorn
Thanks - I’ve seen this SO post, it covers spark-submit, which I am not using. Regarding the ALLUXIO_SPARK_CLIENT variable, it is located on the machine that is running the job which spawns the master=local spark. According to the Spark documentation, this should be possible, but it appears it

Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread yohann jardin
Hey Jason, Might be related to what is behind your variable ALLUXIO_SPARK_CLIENT and where is located the lib (is it on HDFS, on the node that submits the job, or locally to all spark workers?) There is a great post on SO about it: https://stackoverflow.com/a/37348234 We might as well check

Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread Jason Boorn
I do, and this is what I will fall back to if nobody has a better idea :) I was just hoping to get this working as it is much more convenient for my testing pipeline. Thanks again for the help > On Apr 13, 2018, at 11:33 AM, Geoff Von Allmen wrote: > > Ok - `LOCAL`

Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread Geoff Von Allmen
Ok - `LOCAL` makes sense now. Do you have the option to still use `spark-submit` in this scenario, but using the following options: ```bash --master "local[*]" \ --deploy-mode "client" \ ... ``` I know in the past, I have setup some options using `.config("Option", "value")` when creating the

Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread Jason Boorn
Hi Geoff - Appreciate the help here - I do understand what you’re saying below. And I am able to get this working when I submit a job to a local cluster. I think part of the issue here is that there’s ambiguity in the terminology. When I say “LOCAL” spark, I mean an instance of spark that is

Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread Geoff Von Allmen
I fought with a ClassNotFoundException for quite some time, but it was for kafka. The final configuration that got everything working was running spark-submit with the following options: --jars "/path/to/.ivy2/jars/package.jar" \ --driver-class-path "/path/to/.ivy2/jars/package.jar" \ --conf

Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread jb44
Haoyuan - As I mentioned below, I've been through the documentation already. It has not helped me to resolve the issue. Here is what I have tried so far: - setting extraClassPath as explained below - adding fs.alluxio.impl through sparkconf - adding spark.sql.hive.metastore.sharedPrefixes

Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-12 Thread Haoyuan Li
This link should be helpful: https://alluxio.org/docs/1.7/en/Running-Spark-on-Alluxio.html Best regards, Haoyuan (HY) alluxio.com | alluxio.org | powered by Alluxio On Thu, Apr 12, 2018 at 6:32 PM, jb44

Spark LOCAL mode and external jar (extraClassPath)

2018-04-12 Thread jb44
I'm running spark in LOCAL mode and trying to get it to talk to alluxio. I'm getting the error: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found The cause of this error is apparently that Spark cannot find the alluxio client jar in its classpath. I have looked at the