Reamer edited a comment on pull request #4097: URL: https://github.com/apache/zeppelin/pull/4097#issuecomment-843240046
Hi @zjffdu, > We can leverage yarn's resource cache mechanism. That means the same conda env downloaded by yarn_app_1 can be reused by yarn_app_2. If we download it in JupyterKernelInterpreter.java, it may cause network congestion if many python interpreters runs at the same time. I was aware of this, but it seems that downloading dependencies several times is the way of `spark.archives`. It is clear that this is not optimal. > Don't need to update conda archives to hdfs, just use the local file system to store the conda env. This would make the development much smooth, user use the local conda env to verify it in local environment and then move it to yarn environment in production environment. By local, do you mean the local file system of the Zeppelin server? In my environment, the Zeppelin user does not have access to the local file system of the Zeppelin server. Therefore, I would prefer a remote endpoint that is under the control of the Zeppelin user. I understand your development approach and it sounds great, but I think this is not suitable for a production environment. Maybe we can support the download in `JupyterKernelInterpreter.java` with an additional property. Then it should not matter whether the files were provided by YARN or the download. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org