Adam Binford created ZEPPELIN-5276:
--------------------------------------
Summary: Pyspark interpreter doesn't add jars to PYTHONPATH for
yarn cluster mode
Key: ZEPPELIN-5276
URL: https://issues.apache.org/jira/browse/ZEPPELIN-5276
Project: Zeppelin
Issue Type: Bug
Reporter: Adam Binford
When using the native spark-submit to run a python script directly, Spark adds
all the resolved jars from --jars and --packages to the PYTHONPATH. This lets
some packages (like delta.io) automagically add their python packages to your
session.
Because the Pyspark interpreter is launched from a jar during the spark submit,
you don't automatically get that behavior. The PysparkInterpreter should add
the jars to the python path for you when bootstrapping the python session. I
don't know if this only affects yarn cluster mode or other modes as well, as
it's the only one we use.
Currently, you can manually working around this by setting your python path
directly when creating your session, you just need to know the naming format
spark saves jars in:
PYTHONPATH=./io.delta_delta-core_2.12-0.8.0.jar
--
This message was sent by Atlassian Jira
(v8.3.4#803005)