[ 
https://issues.apache.org/jira/browse/TOREE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luciano Resende resolved TOREE-476.
-----------------------------------
    Fix Version/s: Not Applicable
       Resolution: Invalid

PySpark is not supported anymore. Please use a Python kernel such IPython.

> PySpark Magic failed on yarn cluster mode
> -----------------------------------------
>
>                 Key: TOREE-476
>                 URL: https://issues.apache.org/jira/browse/TOREE-476
>             Project: TOREE
>          Issue Type: Bug
>    Affects Versions: 0.2.0
>            Reporter: heyang wang
>            Priority: Major
>             Fix For: Not Applicable
>
>
> I am trying to use PySpark by the magic %%Pysark in Scala notebook following 
> [https://github.com/apache/incubator-toree/blob/master/etc/examples/notebooks/magic-tutorial.ipynb].
>  In spark local mode, the example can work fine. However when run in yarn 
> mode, the Spark Executor complain about not finding pyspark.
>  
> After seeing the source code, I came to understand that the %%Pyspark magic 
> actually use the same spark context created by the spark-submit  command run 
> by Toree. The  spark context  created this way by default doesn't contain any 
> setting related to Python or Pyspark and cause the spark executor to complain 
> when run in yarn mode. I have to add _--conf 
> 'spark.executorEnv.PYTHONPATH'='/usr/local/spark-2.3.0-bin-hadoop2.7/python:/usr/local/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip'_
>  manually to __TOREE_SPARK_OPTS__ on kernel.json to make the magic work in 
> yarn mode.
>  
> I think it would be good to add this setting by default or document it 
> somewhere since running the %%PySpark magic in yarn mode is way more powerful 
> than local mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to