Reamer commented on pull request #4097:
URL: https://github.com/apache/zeppelin/pull/4097#issuecomment-831361414


   You are right, for the pyspark zeppelin interpreter we should use 
`spark.archives` to enable the python (conda) environment.
   For the python zeppelin interpreter we should use a configuration parameter 
that does almost the same as the spark equilant.
   
   > But for python interpreter, I don't think there's unified approach for 
that for now. Buy we can introduce unified configuration for that. e.g. We can 
introduce `python.archive` which will be translated to yarn/k8s specific 
configuration.
   
   The current approach seems to load the conda environment into the HDFS, 
which seems to be quite effective as the Zeppelin server and the Zeppelin 
interpreter process within YARN share the same content.
   A common way for docker, K8s and YARN could be a dynamic download starting 
from the Zeppelin interpreter just before it starts the Python process.
   At the moment I don't know if `spark.archives` supports a download via HTTP. 
I will find out as soon as possible. If this is possible, `python.archive` 
should also do the download so that you don't have to pack python (conda) 
environments several times.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to