[jupyter] Re: Jupyter Enterprise Gateway features

Kevin Bates Wed, 06 Mar 2019 18:02:09 -0800

Hello,

First, I recommend you take a look at JupyterHub, which you may already be 
doing.  The `Kubespawner` spawner essentially launches Notebook/Lab server 
pods.  If your use case is that each user will need to run multiple 
notebooks simultaneously, then Jupyter Enterprise Gateway would be useful 
since it spreads kernel-specific resources across the cluster, otherwise a 
single Notebook/kernel instance per user is tantamount to just using the 
regular feature set offered by Kubespawner since the launched kernels are 
local to that same pod.

So, assuming you do want multiple notebook/kernel instances running for a 
given user, let me try to answer your questions.  There are more details 
that I hesitate to get into at this time.  Hopefully you find this helpful 
at a high level.

1. Mounting Persistent Volumes onto remote kernel pods - Is this a spec 
that can be passed?
Each kernelspec directory contains a kernel-pod.yaml that is used to launch 
the kernel-based pod - assuming it's not Spark based.  (Spark-based 
kernel-pods are launched by Spark and they don't yet support custom 
templates (yaml files - but planned for their 3.0 release).)  At any rate, 
if its a "vanilla" kernel you're after, this can be done.  However, if you 
want different *numbers* or *attributes* of PVs for different users, then 
things get a little dicey since this is a multi-tenant env where its 
assumed all tenants have the same basic needs.  We're working on making 
this aspect more dynamic.

2. Lifecycle management of remote kernels - Is this available? If not, what 
is the current behavior of the remote kernels? 
Enterprise Gateway introduces a pluggable architecture of "process proxies" 
where each implementation knows how to perform lifecycle management for its 
relative target (k8s, docker swarm, hadoop yarn, etc.).  The notebook/lab 
instance doesn't know its talking to a remote kernel.  Lifecycle management 
is just like normal Notebook behavior.  Since kernels are remote, their 
startup times are slightly longer.  Since Jupyter has built-in auto-restart 
logic already, the framework detects the kernel has died (via polling) and 
will destroy the pod and create a new one within the same namespace.  As a 
result, the k8s restart policy is set to never - otherwise we'd have 
confusion.  

3. Dependency management - For dependencies that have been dynamically 
installed within a remote kernel pod, is there any way that these 
dependencies be restored in the next session where a different pod is spun 
up?
This is more a function of where the kernel places things.  For example, 
the Apache Toree (Scala) kernel uses a temp directory.  There might be ways 
to have that be a PV mount, etc.  IPython is probably a function of where 
Python is configured.  I suppose pip could be used to install things into a 
user-relative area, so that could target a PV.  In general, I'd say these 
will not be present in subsequent kernel instances w/o mount tricks.  If 
you know those libraries/dependencies will be required upfront, you'd 
probably want to look into extending our kernel images or building your own.

4. Configuring kernel pod resource specs - If I'm not wrong JKG 2.2.0 
should have already implemented this?
JKG has no support for remote kernels - k8s or otherwise.  It's purely the 
vehicle by which kernel management is "detached" from the Notebook/Lab 
instance and made available to multiple requesters and only supports local 
kernels.  EG makes use of KERNEL_ env variables - which get automatically 
included in the kernel startup requests sent from the Notebook.  As a 
result, one could configure kernelspecs and the kernel-pod.yaml file to 
indicate memory, cpus, etc. where those values are plugged in.

We also support things like *bring your own namespace*, where you can 
specify a pre-configured namespace - perhaps with various resource quotas, 
etc. already in place.  This is also done via KERNEL_ (KERNEL_NAMESPACE).  
If a namespace name is not provided, we create a namespace for each kernel 
pod to help with isolation.  In such cases, those namespaces will be 
removed once the kernel is finally shutdown.

Please visit our docs and repo for more information and let us know what 
else you might need.  We're nearing our 2.0 release, so it would be great 
to try to address things soon.

Take care,
Kevin.

On Tuesday, March 5, 2019 at 11:21:08 PM UTC-8, Hui Si Goh wrote:
>
> Hi Jupyter experts,
>
> My team and I are working to bring JupyterLab in a multi-tenant 
> environment (Kubernetes) where each member of a particular tenant has a 
> dedicated JupyterLab client. 
>
> The following are critical features that we require:
> 1. Mounting Persistent Volumes onto remote kernel pods - Is this a spec 
> that can be passed?
> 2. Lifecycle management of remote kernels - Is this available? If not, 
> what is the current behaviour of the remote kernels? 
> 3. Dependency management - For dependencies that have been dynamically 
> installed within a remote kernel pod, is there any way that these 
> dependencies be restored in the next session where a different pod is spun 
> up? 
> 4. Configuring kernel pod resource specs - If I'm not wrong JKG 2.2.0 
> should have already implemented this?
>
> Thank you!
>
>
>
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jupyter+unsubscr...@googlegroups.com.
To post to this group, send email to jupyter@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jupyter/938a1ad0-f024-456a-acbf-e234fb66b6de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[jupyter] Re: Jupyter Enterprise Gateway features

Reply via email to