ryanchou created SPARK-10658: -------------------------------- Summary: Could pyspark provide addJars() as scala spark API? Key: SPARK-10658 URL: https://issues.apache.org/jira/browse/SPARK-10658 Project: Spark Issue Type: Wish Components: PySpark Affects Versions: 1.3.1 Environment: Linux ubuntu 14.01 LTS Reporter: ryanchou
My spark program was written by pyspark API , and it has used the spark-csv jar library. I could submit the task by spark-submit, and add `--jars` arguments for using spark-csv jar library as following commands: ``` /bin/spark-submit --jars /path/spark-csv_2.10-1.1.0.jar xxx.py ``` However I need to run my unittests like: ``` py.test -vvs test_xxx.py ``` It could't add jars by adding '--jars' arugment. Therefore I tried to use the SparkContext.addPyFile() API to add jars in my test_xxx.py. Because I saw the addPyFile()'s doc mention me PACKAGES_EXTENSIONS = (.zip, .py, .jar). Does it mean that I could add *.jar (jar libraries) by using the addPyFile()? The codes which use addPyFile() to add jars as below: ``` self.sc.addPyFile(join(lib_path, "spark-csv_2.10-1.1.0.jar")) sqlContext = SQLContext(self.sc) self.dataframe = sqlContext.load( source="com.databricks.spark.csv", header="true", path="xxx.csv" ) ``` While it doesn't work. sqlContext cannot load the source(com.databricks.spark.csv) Eventually I have found another way to set the enviroment variable SPARK_CLASSPATH for loading jars libraries ``` SPARK_CLASSPATH="/path/xxx.jar:/path/xxx2.jar" py.test -vvs test_xxx.py ``` It could load the jars libraries and sqlContext could load source succeed as well as adding `--jar xxx1.jar` arguments For the siuation on using third party jars (.py & .egg could work well by using addPyFile()) in pyspark-written scripts. and it cannot use `--jars` on the situation (py.test -vvs test_xxx.py). have you ever planed to provide an API such as addJars() in scala for adding jars to spark program, or was there a better way to add jars I still havent found it yet? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org