[
https://issues.apache.org/jira/browse/SPARK-15969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kun Liu closed SPARK-15969.
---------------------------
Resolution: Done
Seems to be working. So close this JIRA.
> FileNotFoundException: Multiple arguments for py-files flag, (also jars) for
> spark-submit
> -----------------------------------------------------------------------------------------
>
> Key: SPARK-15969
> URL: https://issues.apache.org/jira/browse/SPARK-15969
> Project: Spark
> Issue Type: Bug
> Components: Spark Submit
> Affects Versions: 1.5.0, 1.6.1
> Environment: Mac OS X 10.11.5
> Reporter: Kun Liu
> Priority: Minor
> Original Estimate: 120h
> Remaining Estimate: 120h
>
> First time to open a JIRA issue. Newbie to the Spark community. Correct me if
> I was wrong. Thanks.
> An exception, java.io.FileNotFoundException, happened when multiple arguments
> were specified for the -py-files (also -jars) flag.
> I searched for a while but only found a similar issue on Windows OS:
> https://issues.apache.org/jira/browse/SPARK-6435
> My experiments environment was Mac OS X and Spark version 1.5.0 and 1.6.1
> 1.1 Observations:
> 1) Quotation does not make any difference for the arguments, the result will
> always be the same
> 2) The first path before comma, as long as valid, won’t be a problem whether
> it is an absolute or a relative path
> 3) The second and further py-files paths won’t be a problem if ALL of them
> are:
> a. are relative paths under the same directory as the working directory
> ($PWD); OR
> b. specified by using environment variable at the beginning, e.g.
> $ENV_VAR/path/to/file; OR
> c. preprocessed by $(echo path/to/*.py | tr ' ' ‘,’), no matter
> absolute or relative paths, as long as valid
> 4) The path of the driver program, assuming valid, does not matter, as it is
> a single file
> 1.2 Experiments:
> Assuming main.py calls functions from helper1.py and helper2.py, and all
> paths below are valid.
> ~/Desktop/testpath: main.py, helper1.py, helper2.py
> $SPARK_HOME/testpath: helper1.py, helper2.py
> 1) Successful output:
> a. Multiple python paths are relative paths under the same directory as
> the working directory
> cd $SPARK_HOME
> bin/spark-submit --py-files testpath/helper1.py,testpath/helper2.py
> ~/Desktop/testpath/main.py
> cd ~/Desktop
> $SPARK_HOME/bin/spark-submit --py-files
> testpath/helper1.py,testpath/helper2.py testpath/main.py
> b. Multiple python paths are specified by using environment variable
> export TEST_DIR=~/Desktop/testpath
> cd ~
> $SPARK_HOME/bin/spark-submit --py-files
> $TEST_DIR/helper1.py,$TEST_DIR/helper2.py ~/Desktop/testpath/main.py
>
> cd ~/Documents
> $SPARK_HOME/bin/spark-submit --py-files
> $TEST_DIR/helper1.py,$TEST_DIR/helper2.py ~/Desktop/testpath/main.py
> c. Multiple paths (absolute or relative) after being preprocessed:
> $SPARK_HOME/bin/spark-submit --py-files $(echo
> $SPARK_HOME/testpath/helper*.py | tr ' ' ',') ~/Desktop/testpath/main.py
> cd ~/Desktop
> $SPARK_HOME/bin/spark-submit --py-files $(echo testpath/helper*.py | tr
> ' ' ',') ~/Desktop/testpath/main.py
> (reference link:
> http://stackoverflow.com/questions/24855368/spark-throws-classnotfoundexception-when-using-jars-option)
> 2) Failure output: if the second python path is an absolute one; the same
> problem will happen for further paths
> cd ~/Documents
> $SPARK_HOME/bin/spark-submit --py-files
> ~/Desktop/testpath/helper1.py,~/Desktop/testpath/helper2.py
> ~/Desktop/testpath/main.py
> py4j.protocol.Py4JJavaError: An error occurred while calling
> None.org.apache.spark.api.java.JavaSparkContext.
> : java.io.FileNotFoundException: Added file
> file:/Users/kunliu/Documents/~/Desktop/testpath/helper2.py does not exist.
> 1.3 Conclusions
> I would suggest the py-files flag of spark-submit could support all absolute
> paths arguments, not just relative path under the working directory.
> If necessary, I would like to submit a pull request and start working on it
> as my first contribution to the Spark community.
> 1.4 Note
> 1) I think the same issue will happen when multiple jar files delimited by
> comma are passed to the —jars flag flag for Java applications.
> 2) I suggest wildcard paths arguments should also be supported, as indicated
> by https://issues.apache.org/jira/browse/SPARK-3451
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]