[GitHub] [zeppelin] Reamer commented on pull request #4097: [ZEPPELIN-5330]. Support conda env for python interpreter in yarn mode

GitBox Tue, 18 May 2021 07:52:41 -0700


Reamer commented on pull request #4097:
URL: https://github.com/apache/zeppelin/pull/4097#issuecomment-843240046



   Hi @zjffdu,
   
   > We can leverage yarn's resource cache mechanism. That means the same conda 
env downloaded by yarn_app_1 can be reused by yarn_app_2. If we download it in 
JupyterKernelInterpreter.java, it may cause network congestion if many python 
interpreters runs at the same time.
   
   I was aware of this, but it seems that downloading dependencies several 
times is the way of `spark.archives`. It is clear that this is not optimal.
   
   > Don't need to update conda archives to hdfs, just use the local file 
system to store the conda env. This would make the development much smooth, 
user use the local conda env to verify it in local environment and then move it 
to yarn environment in production environment.
   
   By local, do you mean the local file system of the Zeppelin server? In my 
environment, the Zeppelin user does not have access to the local file system of 
the Zeppelin server. Therefore, I would prefer a remote endpoint that is under 
the control of the Zeppelin user.
   I understand your development approach and it sounds great, but I think this 
is not suitable for a production environment. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [zeppelin] Reamer commented on pull request #4097: [ZEPPELIN-5330]. Support conda env for python interpreter in yarn mode

Reply via email to