Re: [jupyter] Kubernetes NVIDIA GPU/extraVolumeMount issues

Benedikt Bäumle Wed, 26 Sep 2018 10:20:43 -0700

Fixed. Can be deleted.

Am Donnerstag, 13. September 2018 19:05:39 UTC+2 schrieb Chia-liang Kao:
>
> Hi,
>
> 1. for user home pvc, make sure you have correct fsGid configured. if you 
> use docker-stack (jupyter/*) based notebook, it should also try properly to 
> chown the user home directory before su into the jovyan user.
>
> 2. is your single user image with the tensorflow-gpu or tensorflow 
> package? beware that conda can pull non-gpu version from mixed channels 
> even if you specifically install tensorflow-gpu.
>


Jupyter Notebook didn't give me any log messages. Having a look on the logs 
in a python terminal showed me that my test graphic card was not compatible.
 

>
> 3. limit: 0 does not take away GPUs. you need to configure 
> NVIDIA_VISIBLE_DEVICES=none as extra env in this case.
>

The incompatibility of my graphic card was also the problem.

>
> Best,
> clkao
>
>
> Benedikt Bäumle <[email protected] <javascript:>> 於 2018年9月13日 週四 
> 下午6:53寫道：
>
>> Hey guys,
>>
>> I am currently setting up a Kubernetes bare-metal single node cluster + 
>> Jupyterhub for having control over resources for our users. I use Helm to 
>> set up jupyterhub with a custom singleuser-notebook image for deep learning.
>>
>> The idea is to set up the hub to have better control over NVIDIA GPUs on 
>> the server.
>>
>> I am struggling with some things I can't figure out how to do or if that 
>> is even possible:
>>
>> 1. I mount the home directory of the user to the notebook user ( in our 
>> case /home/dbvis/ ) in the helm chart values.yaml:
>>
>>     extraVolumes:
>>         - name: home
>>           hostPath:
>>             path: /home/{username}
>>     extraVolumeMounts:
>>         - name: home
>>           mountPath: /home/dbvis/data
>>
>> It is indeed mounted like this, but with root:root ownership and I can't 
>> add/remove/change anything inside the container at /home/dbvis/data. What 
>> is tried out:
>>
>> - I tried to change the ownership in the Dockerfile by running 'chown -R 
>> dbvis:dbvis /home/dbvis/' in the end as root user
>> - I tried through the following postStart hook in the values.yaml
>>
>>     lifecycleHooks:
>>       postStart:
>>         exec:
>>           command: ["chown", "-R", "dbvis:dbvis", "/home/dbvis/data"]
>>
>> Both didn't work...as storageclass I set up rook with rook-ceph-block 
>> storage.
>> Any ideas?
>>
>>
>> 2. We have several NVIDIA GPUs and I would like to control them and set 
>> limits for the jupyter singelser-notebooks. I set up 'nvidia device plugin' 
>> ( https://github.com/NVIDIA/k8s-device-plugin ). 
>> When I use 'kubectl describe node' I find the GPU as resource:
>>
>> Allocatable:
>>  cpu:                16
>>  ephemeral-storage:  189274027310
>>  hugepages-1Gi:      0
>>  hugepages-2Mi:      0
>>  memory:             98770548Ki
>>  nvidia.com/gpu:     1
>>  pods:               110
>> ...
>> ...
>> Allocated resources:
>>   (Total limits may be over 100 percent, i.e., overcommitted.)
>>   Resource        Requests     Limits
>>   --------        --------     ------
>>   cpu             2250m (14%)  4100m (25%)
>>   memory          2238Mi (2%)  11146362880 (11%)
>>   nvidia.com/gpu  0            0
>> Events:           <none>
>>
>> Inside the jupyter singleuser-notebooks I can see the GPU, when executing 
>> 'nvidia-smi'. 
>> But if I run e.g. tensorflow to see the GPU with the following code:
>>
>> from tensorflow.python.client import device_lib
>>
>> device_lib.list_local_devices()
>>
>> I just get the CPU device:
>>
>> [name: "/device:CPU:0"
>>  device_type: "CPU"
>>  memory_limit: 268435456
>>  locality {
>>  }
>>  incarnation: 232115754901553261]
>>
>>
>> Any idea what I am doing wrong? 
>>
>> Further, I would like to limit the amount of GPUs ( It is just a test 
>> environment with one GPU we have more ). I tried the following which 
>> doesn't seem to have an effect:
>>
>> - Add the following config in values.yaml in any combination possible:
>>
>>   extraConfig: |
>>      c.Spawner.notebook_dir = '/home/dbvis'
>>      c.Spawner.extra_resource_limits: {'nvidia.com/gpu': '0'}
>>      c.Spawner.extra_resource_guarantees: {'nvidia.com/gpu': '0'}
>>      c.Spawner.args = ['--device=/dev/nvidiactl', 
>> '--device=/dev/nvidia-uvm', '--device=/dev/nvidia-uvm-tools', 
>> '/dev/nvidia0' ]
>>
>> - Add the GPU to the resources in the singleuser configuration in 
>> values.yaml:
>>
>> singleuser:
>>   image:
>>     name: benne4444/dbvis-singleuser
>>     tag: test3
>>   nvidia.com/gpu:
>>     limit: 1
>>     guarantee: 1
>>
>> Is what I am trying even possible right now?
>>
>> Further information:
>>
>> I set up a server running 
>>
>> - Ubuntu 18.04.1 LTS
>> - docker-nvidia
>> - helm jupyterhub version 0.8-ea0cf9a
>>
>> I added the complete values.yaml.
>>
>> If you need additional information please let me know. Any help is 
>> appreciated a lot.
>>
>> Thank you,
>> Benedikt
>>
>>
>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Project Jupyter" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/jupyter/585d4d0b-5d8d-4cf2-b109-2c16f93d1f62%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/jupyter/585d4d0b-5d8d-4cf2-b109-2c16f93d1f62%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

 

-- 
You received this message because you are subscribed to the Google Groups 
"Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jupyter/226b0de9-e969-4607-82bd-6241350bf346%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [jupyter] Kubernetes NVIDIA GPU/extraVolumeMount issues

Reply via email to