[ 
https://issues.apache.org/jira/browse/SPARK-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157683#comment-15157683
 ] 

Carlos Bribiescas edited comment on SPARK-10795 at 2/22/16 8:58 PM:
--------------------------------------------------------------------

Using this command to submit job {code}spark-submit --master yarn-cluster 
--num-executors 1 --driver-memory 1g --executor-memory 1g --executor-cores 1 
MyPythonFile.py{code}

if MyPythonFile.py looks like this
{code}
from pyspark import SparkContext

jobName="My Name"
sc = SparkContext(appName=jobName)

{code}
Then everything is fine.  If MyPythonFile.py does not specify a spark context 
(As one would in the interactive shell) then it gives me the error you say in 
your bug.  Using the following file instead I'm able to reproduce the bug.

{code}
from pyspark import SparkContext

jobName="My Name"
# sc = SparkContext(appName=jobName)

{code}

So I suspect you just didn't define a spark context properly for a cluster.  
Hope this helps



was (Author: cbribiescas):
Using this command spark-submit --master yarn-cluster --num-executors 1 
--driver-memory 1g --executor-memory 1g --executor-cores 1 MyPythonFile.py

if MyPythonFile.py looks like this
{code}
from pyspark import SparkContext

jobName="My Name"
sc = SparkContext(appName=jobName)

{code}
Then everything is fine.  If MyPythonFile.py does not specify a spark context 
(As one would in the interactive shell) then it gives me the error you say in 
your bug.  Using the following file instead I'm able to reproduce the bug.

{code}
from pyspark import SparkContext

jobName="My Name"
# sc = SparkContext(appName=jobName)

{code}

So I suspect you just didn't define a spark context properly for a cluster.  
Hope this helps


> FileNotFoundException while deploying pyspark job on cluster
> ------------------------------------------------------------
>
>                 Key: SPARK-10795
>                 URL: https://issues.apache.org/jira/browse/SPARK-10795
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>         Environment: EMR 
>            Reporter: Harshit
>
> I am trying to run simple spark job using pyspark, it works as standalone , 
> but while I deploy over cluster it fails.
> Events :
> 2015-09-24 10:38:49,602 INFO  [main] yarn.Client (Logging.scala:logInfo(59)) 
> - Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> 
> hdfs://ip-xxxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> Above uploading resource file is successfull , I manually checked file is 
> present in above specified path , but after a while I face following error :
> Diagnostics: File does not exist: 
> hdfs://ip-xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ip-1xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to