ryanchou created SPARK-10658:
--------------------------------
Summary: Could pyspark provide addJars() as scala spark API?
Key: SPARK-10658
URL: https://issues.apache.org/jira/browse/SPARK-10658
Project: Spark
Issue Type: Wish
Components: PySpark
Affects Versions: 1.3.1
Environment: Linux ubuntu 14.01 LTS
Reporter: ryanchou
My spark program was written by pyspark API , and it has used the spark-csv jar
library.
I could submit the task by spark-submit, and add `--jars` arguments for using
spark-csv jar library as following commands:
```
/bin/spark-submit --jars /path/spark-csv_2.10-1.1.0.jar xxx.py
```
However I need to run my unittests like:
```
py.test -vvs test_xxx.py
```
It could't add jars by adding '--jars' arugment.
Therefore I tried to use the SparkContext.addPyFile() API to add jars in my
test_xxx.py.
Because I saw the addPyFile()'s doc mention me PACKAGES_EXTENSIONS = (.zip,
.py, .jar).
Does it mean that I could add *.jar (jar libraries) by using the addPyFile()?
The codes which use addPyFile() to add jars as below:
```
self.sc.addPyFile(join(lib_path, "spark-csv_2.10-1.1.0.jar"))
sqlContext = SQLContext(self.sc)
self.dataframe = sqlContext.load(
source="com.databricks.spark.csv",
header="true",
path="xxx.csv"
)
```
While it doesn't work. sqlContext cannot load the
source(com.databricks.spark.csv)
Eventually I have found another way to set the enviroment variable
SPARK_CLASSPATH for loading jars libraries
```
SPARK_CLASSPATH="/path/xxx.jar:/path/xxx2.jar" py.test -vvs test_xxx.py
```
It could load the jars libraries and sqlContext could load source succeed as
well as adding `--jar xxx1.jar` arguments
For the siuation on using third party jars (.py & .egg could work well by using
addPyFile()) in pyspark-written scripts.
and it cannot use `--jars` on the situation (py.test -vvs test_xxx.py).
have you ever planed to provide an API such as addJars() in scala for adding
jars to spark program, or was there a better way to add jars I still havent
found it yet?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]