[ 
https://issues.apache.org/jira/browse/TOREE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

heyang wang updated TOREE-476:
------------------------------
    Description: 
I am trying to use PySpark by the magic %%Pysark in Scala notebook following 
[https://github.com/apache/incubator-toree/blob/master/etc/examples/notebooks/magic-tutorial.ipynb].
 In spark local mode, the example can work fine. However when run in yarn mode, 
the Spark Executor complain about not finding pyspark.

 

After seeing the source code, I came to understand that the %%Pyspark magic 
actually use the same spark context created by the spark-submit  command run by 
Toree. The  spark context  created this way by default doesn't contain any 
setting related to Python or Pyspark and cause the spark executor to complain 
when run in yarn mode. I have to add _--conf 
'spark.executorEnv.PYTHONPATH'='/usr/local/spark-2.3.0-bin-hadoop2.7/python:/usr/local/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip'_
 manually to __TOREE_SPARK_OPTS__ on kernel.json to make the magic work in yarn 
mode.

 

I think it would be good to add this setting by default or document it 
somewhere since running the %%PySpark magic in yarn mode is way more powerful 
than local mode.

  was:
I am trying to use PySpark by the magic %%Pysark in Scala notebook following 
[https://github.com/apache/incubator-toree/blob/master/etc/examples/notebooks/magic-tutorial.ipynb].
 In spark local mode, the example can work fine. However when run in yarn mode, 
the Spark Executor complain about not finding pyspark.

 

[After seeing the source code, I came to understand that the %%Pyspark magic 
actually use the same spark context created by the spark-submit  command run by 
Toree. The  spark context  created this way by default doesn't contain any 
setting related to Python or Pyspark and cause the spark executor to complain 
when run in yarn mode. I have to add _--conf 
'spark.executorEnv.PYTHONPATH'='/usr/local/spark-2.3.0-bin-hadoop2.7/python:/usr/local/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip'_
 manually to __TOREE_SPARK_OPTS__ on kernel.json to make the magic work in yarn 
mode.

 

I think it would be good to add this setting by default or document it 
somewhere since running the %%PySpark magic in yarn mode is way more powerful 
than local mode.


> PySpark Magic failed on yarn cluster mode
> -----------------------------------------
>
>                 Key: TOREE-476
>                 URL: https://issues.apache.org/jira/browse/TOREE-476
>             Project: TOREE
>          Issue Type: Bug
>    Affects Versions: 0.2.0
>            Reporter: heyang wang
>            Priority: Major
>
> I am trying to use PySpark by the magic %%Pysark in Scala notebook following 
> [https://github.com/apache/incubator-toree/blob/master/etc/examples/notebooks/magic-tutorial.ipynb].
>  In spark local mode, the example can work fine. However when run in yarn 
> mode, the Spark Executor complain about not finding pyspark.
>  
> After seeing the source code, I came to understand that the %%Pyspark magic 
> actually use the same spark context created by the spark-submit  command run 
> by Toree. The  spark context  created this way by default doesn't contain any 
> setting related to Python or Pyspark and cause the spark executor to complain 
> when run in yarn mode. I have to add _--conf 
> 'spark.executorEnv.PYTHONPATH'='/usr/local/spark-2.3.0-bin-hadoop2.7/python:/usr/local/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip'_
>  manually to __TOREE_SPARK_OPTS__ on kernel.json to make the magic work in 
> yarn mode.
>  
> I think it would be good to add this setting by default or document it 
> somewhere since running the %%PySpark magic in yarn mode is way more powerful 
> than local mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to