[ 
https://issues.apache.org/jira/browse/SPARK-15969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kun Liu updated SPARK-15969:
----------------------------
    Remaining Estimate: 120h  (was: 168h)
     Original Estimate: 120h  (was: 168h)

> FileNotFoundException: Multiple arguments for py-files flag, (also jars) for 
> spark-submit
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-15969
>                 URL: https://issues.apache.org/jira/browse/SPARK-15969
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Submit
>    Affects Versions: 1.5.0, 1.6.1
>         Environment: Mac OS X 10.11.5
>            Reporter: Kun Liu
>            Priority: Minor
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> First time to open a JIRA issue. Newbie to the Spark community. Correct me if 
> I was wrong. Thanks.
> An exception, java.io.FileNotFoundException, happened when multiple arguments 
> were specified for the -py-files (also -jars) flag.
> I searched for a while but only found a similar issue on Windows OS: 
> https://issues.apache.org/jira/browse/SPARK-6435
> My experiments environment was Mac OS X and Spark version 1.5.0 and 1.6.1
> 1.1 Observations:
> 1) Quotation does not make any difference for the arguments, the result will 
> always be the same
> 2) The first path before comma, as long as valid, won’t be a problem whether 
> it is an absolute or a relative path
> 3) The second and further py-files paths won’t be a problem if ALL of them 
> are:
>       a. are relative paths under the same directory as the working directory 
> ($PWD); OR
>       b. specified by using environment variable at the beginning, e.g. 
> $ENV_VAR/path/to/file; OR
>       c. preprocessed by $(echo path/to/*.py | tr ' ' ‘,’), no matter 
> absolute or relative paths, as long as valid
> 4) The path of the driver program, assuming valid, does not matter, as it is 
> a single file
> 1.2 Experiments:
> Assuming main.py calls functions from helper1.py and helper2.py, and all 
> paths below are valid.
> ~/Desktop/testpath: main.py, helper1.py, helper2.py
> $SPARK_HOME/testpath: helper1.py, helper2.py
> 1) Successful output:
>       a. Multiple python paths are relative paths under the same directory as 
> the working directory
>       cd $SPARK_HOME
>       bin/spark-submit --py-files testpath/helper1.py,testpath/helper2.py 
> ~/Desktop/testpath/main.py
>       cd ~/Desktop
>       $SPARK_HOME/bin/spark-submit --py-files 
> testpath/helper1.py,testpath/helper2.py testpath/main.py
>       b. Multiple python paths are specified by using environment variable
>       export TEST_DIR=~/Desktop/testpath
>       cd ~
>       $SPARK_HOME/bin/spark-submit --py-files 
> $TEST_DIR/helper1.py,$TEST_DIR/helper2.py ~/Desktop/testpath/main.py
>       
>       cd ~/Documents
>       $SPARK_HOME/bin/spark-submit --py-files 
> $TEST_DIR/helper1.py,$TEST_DIR/helper2.py ~/Desktop/testpath/main.py
>       c. Multiple paths (absolute or relative) after being preprocessed:
>       $SPARK_HOME/bin/spark-submit --py-files $(echo 
> $SPARK_HOME/testpath/helper*.py | tr ' ' ',') ~/Desktop/testpath/main.py 
>       cd ~/Desktop
>       $SPARK_HOME/bin/spark-submit --py-files $(echo testpath/helper*.py | tr 
> ' ' ',') ~/Desktop/testpath/main.py 
>       (reference link: 
> http://stackoverflow.com/questions/24855368/spark-throws-classnotfoundexception-when-using-jars-option)
> 2) Failure output: if the second python path is an absolute one; the same 
> problem will happen for further paths
>       cd ~/Documents
>       $SPARK_HOME/bin/spark-submit --py-files 
> ~/Desktop/testpath/helper1.py,~/Desktop/testpath/helper2.py 
> ~/Desktop/testpath/main.py 
>       py4j.protocol.Py4JJavaError: An error occurred while calling 
> None.org.apache.spark.api.java.JavaSparkContext.
>       : java.io.FileNotFoundException: Added file 
> file:/Users/kunliu/Documents/~/Desktop/testpath/helper2.py does not exist.
> 1.3 Conclusions
> I would suggest the py-files flag of spark-submit could support all absolute 
> paths arguments, not just relative path under the working directory.
> If necessary, I would like to submit a pull request and start working on it 
> as my first contribution to the Spark community.
> 1.4 Note
> 1) I think the same issue will happen when multiple jar files delimited by 
> comma are passed to the —jars flag flag for Java applications.
> 2) I suggest wildcard paths arguments should also be supported, as indicated 
> by https://issues.apache.org/jira/browse/SPARK-3451



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to