Adam Binford created ZEPPELIN-5276:
--------------------------------------

             Summary: Pyspark interpreter doesn't add jars to PYTHONPATH for 
yarn cluster mode
                 Key: ZEPPELIN-5276
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-5276
             Project: Zeppelin
          Issue Type: Bug
            Reporter: Adam Binford


When using the native spark-submit to run a python script directly, Spark adds 
all the resolved jars from --jars and --packages to the PYTHONPATH. This lets 
some packages (like delta.io) automagically add their python packages to your 
session.

Because the Pyspark interpreter is launched from a jar during the spark submit, 
you don't automatically get that behavior. The PysparkInterpreter should add 
the jars to the python path for you when bootstrapping the python session. I 
don't know if this only affects yarn cluster mode or other modes as well, as 
it's the only one we use.

Currently, you can manually working around this by setting your python path 
directly when creating your session, you just need to know the naming format 
spark saves jars in:

PYTHONPATH=./io.delta_delta-core_2.12-0.8.0.jar



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to