[ https://issues.apache.org/jira/browse/TOREE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
heyang wang updated TOREE-476: ------------------------------ Description: I am trying to use PySpark by the magic %%Pysark in Scala notebook following [https://github.com/apache/incubator-toree/blob/master/etc/examples/notebooks/magic-tutorial.ipynb]. In spark local mode, the example can work fine. However when run in yarn mode, the Spark Executor complain about not finding pyspark. After seeing the source code, I came to understand that the %%Pyspark magic actually use the same spark context created by the spark-submit command run by Toree. The spark context created this way by default doesn't contain any setting related to Python or Pyspark and cause the spark executor to complain when run in yarn mode. I have to add _--conf 'spark.executorEnv.PYTHONPATH'='/usr/local/spark-2.3.0-bin-hadoop2.7/python:/usr/local/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip'_ manually to __TOREE_SPARK_OPTS__ on kernel.json to make the magic work in yarn mode. I think it would be good to add this setting by default or document it somewhere since running the %%PySpark magic in yarn mode is way more powerful than local mode. was: I am trying to use PySpark by the magic %%Pysark in Scala notebook following [https://github.com/apache/incubator-toree/blob/master/etc/examples/notebooks/magic-tutorial.ipynb]. In spark local mode, the example can work fine. However when run in yarn mode, the Spark Executor complain about not finding pyspark. [After seeing the source code, I came to understand that the %%Pyspark magic actually use the same spark context created by the spark-submit command run by Toree. The spark context created this way by default doesn't contain any setting related to Python or Pyspark and cause the spark executor to complain when run in yarn mode. I have to add _--conf 'spark.executorEnv.PYTHONPATH'='/usr/local/spark-2.3.0-bin-hadoop2.7/python:/usr/local/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip'_ manually to __TOREE_SPARK_OPTS__ on kernel.json to make the magic work in yarn mode. I think it would be good to add this setting by default or document it somewhere since running the %%PySpark magic in yarn mode is way more powerful than local mode. > PySpark Magic failed on yarn cluster mode > ----------------------------------------- > > Key: TOREE-476 > URL: https://issues.apache.org/jira/browse/TOREE-476 > Project: TOREE > Issue Type: Bug > Affects Versions: 0.2.0 > Reporter: heyang wang > Priority: Major > > I am trying to use PySpark by the magic %%Pysark in Scala notebook following > [https://github.com/apache/incubator-toree/blob/master/etc/examples/notebooks/magic-tutorial.ipynb]. > In spark local mode, the example can work fine. However when run in yarn > mode, the Spark Executor complain about not finding pyspark. > > After seeing the source code, I came to understand that the %%Pyspark magic > actually use the same spark context created by the spark-submit command run > by Toree. The spark context created this way by default doesn't contain any > setting related to Python or Pyspark and cause the spark executor to complain > when run in yarn mode. I have to add _--conf > 'spark.executorEnv.PYTHONPATH'='/usr/local/spark-2.3.0-bin-hadoop2.7/python:/usr/local/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip'_ > manually to __TOREE_SPARK_OPTS__ on kernel.json to make the magic work in > yarn mode. > > I think it would be good to add this setting by default or document it > somewhere since running the %%PySpark magic in yarn mode is way more powerful > than local mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005)