Reamer commented on pull request #4097: URL: https://github.com/apache/zeppelin/pull/4097#issuecomment-831361414
You are right, for the pyspark zeppelin interpreter we should use `spark.archives` to enable the python (conda) environment. For the python zeppelin interpreter we should use a configuration parameter that does almost the same as the spark equilant. > But for python interpreter, I don't think there's unified approach for that for now. Buy we can introduce unified configuration for that. e.g. We can introduce `python.archive` which will be translated to yarn/k8s specific configuration. The current approach seems to load the conda environment into the HDFS, which seems to be quite effective as the Zeppelin server and the Zeppelin interpreter process within YARN share the same content. A common way for docker, K8s and YARN could be a dynamic download starting from the Zeppelin interpreter just before it starts the Python process. At the moment I don't know if `spark.archives` supports a download via HTTP. I will find out as soon as possible. If this is possible, `python.archive` should also do the download so that you don't have to pack python (conda) environments several times. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org