[
https://issues.apache.org/jira/browse/TOREE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luciano Resende resolved TOREE-476.
-----------------------------------
Fix Version/s: Not Applicable
Resolution: Invalid
PySpark is not supported anymore. Please use a Python kernel such IPython.
> PySpark Magic failed on yarn cluster mode
> -----------------------------------------
>
> Key: TOREE-476
> URL: https://issues.apache.org/jira/browse/TOREE-476
> Project: TOREE
> Issue Type: Bug
> Affects Versions: 0.2.0
> Reporter: heyang wang
> Priority: Major
> Fix For: Not Applicable
>
>
> I am trying to use PySpark by the magic %%Pysark in Scala notebook following
> [https://github.com/apache/incubator-toree/blob/master/etc/examples/notebooks/magic-tutorial.ipynb].
> In spark local mode, the example can work fine. However when run in yarn
> mode, the Spark Executor complain about not finding pyspark.
>
> After seeing the source code, I came to understand that the %%Pyspark magic
> actually use the same spark context created by the spark-submit command run
> by Toree. The spark context created this way by default doesn't contain any
> setting related to Python or Pyspark and cause the spark executor to complain
> when run in yarn mode. I have to add _--conf
> 'spark.executorEnv.PYTHONPATH'='/usr/local/spark-2.3.0-bin-hadoop2.7/python:/usr/local/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip'_
> manually to __TOREE_SPARK_OPTS__ on kernel.json to make the magic work in
> yarn mode.
>
> I think it would be good to add this setting by default or document it
> somewhere since running the %%PySpark magic in yarn mode is way more powerful
> than local mode.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)