Re: [jupyter] Kubernetes NVIDIA GPU/extraVolumeMount issues

Chia-liang Kao Thu, 13 Sep 2018 10:06:01 -0700

Hi,

1. for user home pvc, make sure you have correct fsGid configured. if you
use docker-stack (jupyter/*) based notebook, it should also try properly to
chown the user home directory before su into the jovyan user.


2. is your single user image with the tensorflow-gpu or tensorflow package?
beware that conda can pull non-gpu version from mixed channels even if you
specifically install tensorflow-gpu.

3. limit: 0 does not take away GPUs. you need to configure
NVIDIA_VISIBLE_DEVICES=none as extra env in this case.

Best,
clkao


Benedikt Bäumle <[email protected]> 於 2018年9月13日 週四 下午6:53寫道：

> Hey guys,
>
> I am currently setting up a Kubernetes bare-metal single node cluster +
> Jupyterhub for having control over resources for our users. I use Helm to
> set up jupyterhub with a custom singleuser-notebook image for deep learning.
>
> The idea is to set up the hub to have better control over NVIDIA GPUs on
> the server.
>
> I am struggling with some things I can't figure out how to do or if that
> is even possible:
>
> 1. I mount the home directory of the user to the notebook user ( in our
> case /home/dbvis/ ) in the helm chart values.yaml:
>
>     extraVolumes:
>         - name: home
>           hostPath:
>             path: /home/{username}
>     extraVolumeMounts:
>         - name: home
>           mountPath: /home/dbvis/data
>
> It is indeed mounted like this, but with root:root ownership and I can't
> add/remove/change anything inside the container at /home/dbvis/data. What
> is tried out:
>
> - I tried to change the ownership in the Dockerfile by running 'chown -R
> dbvis:dbvis /home/dbvis/' in the end as root user
> - I tried through the following postStart hook in the values.yaml
>
>     lifecycleHooks:
>       postStart:
>         exec:
>           command: ["chown", "-R", "dbvis:dbvis", "/home/dbvis/data"]
>
> Both didn't work...as storageclass I set up rook with rook-ceph-block
> storage.
> Any ideas?
>
>
> 2. We have several NVIDIA GPUs and I would like to control them and set
> limits for the jupyter singelser-notebooks. I set up 'nvidia device plugin'
> ( https://github.com/NVIDIA/k8s-device-plugin ).
> When I use 'kubectl describe node' I find the GPU as resource:
>
> Allocatable:
>  cpu:                16
>  ephemeral-storage:  189274027310
>  hugepages-1Gi:      0
>  hugepages-2Mi:      0
>  memory:             98770548Ki
>  nvidia.com/gpu:     1
>  pods:               110
> ...
> ...
> Allocated resources:
>   (Total limits may be over 100 percent, i.e., overcommitted.)
>   Resource        Requests     Limits
>   --------        --------     ------
>   cpu             2250m (14%)  4100m (25%)
>   memory          2238Mi (2%)  11146362880 (11%)
>   nvidia.com/gpu  0            0
> Events:           <none>
>
> Inside the jupyter singleuser-notebooks I can see the GPU, when executing
> 'nvidia-smi'.
> But if I run e.g. tensorflow to see the GPU with the following code:
>
> from tensorflow.python.client import device_lib
>
> device_lib.list_local_devices()
>
> I just get the CPU device:
>
> [name: "/device:CPU:0"
>  device_type: "CPU"
>  memory_limit: 268435456
>  locality {
>  }
>  incarnation: 232115754901553261]
>
>
> Any idea what I am doing wrong?
>
> Further, I would like to limit the amount of GPUs ( It is just a test
> environment with one GPU we have more ). I tried the following which
> doesn't seem to have an effect:
>
> - Add the following config in values.yaml in any combination possible:
>
>   extraConfig: |
>      c.Spawner.notebook_dir = '/home/dbvis'
>      c.Spawner.extra_resource_limits: {'nvidia.com/gpu': '0'}
>      c.Spawner.extra_resource_guarantees: {'nvidia.com/gpu': '0'}
>      c.Spawner.args = ['--device=/dev/nvidiactl',
> '--device=/dev/nvidia-uvm', '--device=/dev/nvidia-uvm-tools',
> '/dev/nvidia0' ]
>
> - Add the GPU to the resources in the singleuser configuration in
> values.yaml:
>
> singleuser:
>   image:
>     name: benne4444/dbvis-singleuser
>     tag: test3
>   nvidia.com/gpu:
>     limit: 1
>     guarantee: 1
>
> Is what I am trying even possible right now?
>
> Further information:
>
> I set up a server running
>
> - Ubuntu 18.04.1 LTS
> - docker-nvidia
> - helm jupyterhub version 0.8-ea0cf9a
>
> I added the complete values.yaml.
>
> If you need additional information please let me know. Any help is
> appreciated a lot.
>
> Thank you,
> Benedikt
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Project Jupyter" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jupyter/585d4d0b-5d8d-4cf2-b109-2c16f93d1f62%40googlegroups.com
> <https://groups.google.com/d/msgid/jupyter/585d4d0b-5d8d-4cf2-b109-2c16f93d1f62%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jupyter/CACY80b3NyGcYwbN%3DSmFBtEGXoKGWM3Q3zUsXkDeW0929mX0h0Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [jupyter] Kubernetes NVIDIA GPU/extraVolumeMount issues

Reply via email to