Fixed. Can be deleted. Am Donnerstag, 13. September 2018 19:05:39 UTC+2 schrieb Chia-liang Kao: > > Hi, > > 1. for user home pvc, make sure you have correct fsGid configured. if you > use docker-stack (jupyter/*) based notebook, it should also try properly to > chown the user home directory before su into the jovyan user. > > 2. is your single user image with the tensorflow-gpu or tensorflow > package? beware that conda can pull non-gpu version from mixed channels > even if you specifically install tensorflow-gpu. >
Jupyter Notebook didn't give me any log messages. Having a look on the logs in a python terminal showed me that my test graphic card was not compatible. > > 3. limit: 0 does not take away GPUs. you need to configure > NVIDIA_VISIBLE_DEVICES=none as extra env in this case. > The incompatibility of my graphic card was also the problem. > > Best, > clkao > > > Benedikt Bäumle <[email protected] <javascript:>> 於 2018年9月13日 週四 > 下午6:53寫道: > >> Hey guys, >> >> I am currently setting up a Kubernetes bare-metal single node cluster + >> Jupyterhub for having control over resources for our users. I use Helm to >> set up jupyterhub with a custom singleuser-notebook image for deep learning. >> >> The idea is to set up the hub to have better control over NVIDIA GPUs on >> the server. >> >> I am struggling with some things I can't figure out how to do or if that >> is even possible: >> >> 1. I mount the home directory of the user to the notebook user ( in our >> case /home/dbvis/ ) in the helm chart values.yaml: >> >> extraVolumes: >> - name: home >> hostPath: >> path: /home/{username} >> extraVolumeMounts: >> - name: home >> mountPath: /home/dbvis/data >> >> It is indeed mounted like this, but with root:root ownership and I can't >> add/remove/change anything inside the container at /home/dbvis/data. What >> is tried out: >> >> - I tried to change the ownership in the Dockerfile by running 'chown -R >> dbvis:dbvis /home/dbvis/' in the end as root user >> - I tried through the following postStart hook in the values.yaml >> >> lifecycleHooks: >> postStart: >> exec: >> command: ["chown", "-R", "dbvis:dbvis", "/home/dbvis/data"] >> >> Both didn't work...as storageclass I set up rook with rook-ceph-block >> storage. >> Any ideas? >> >> >> 2. We have several NVIDIA GPUs and I would like to control them and set >> limits for the jupyter singelser-notebooks. I set up 'nvidia device plugin' >> ( https://github.com/NVIDIA/k8s-device-plugin ). >> When I use 'kubectl describe node' I find the GPU as resource: >> >> Allocatable: >> cpu: 16 >> ephemeral-storage: 189274027310 >> hugepages-1Gi: 0 >> hugepages-2Mi: 0 >> memory: 98770548Ki >> nvidia.com/gpu: 1 >> pods: 110 >> ... >> ... >> Allocated resources: >> (Total limits may be over 100 percent, i.e., overcommitted.) >> Resource Requests Limits >> -------- -------- ------ >> cpu 2250m (14%) 4100m (25%) >> memory 2238Mi (2%) 11146362880 (11%) >> nvidia.com/gpu 0 0 >> Events: <none> >> >> Inside the jupyter singleuser-notebooks I can see the GPU, when executing >> 'nvidia-smi'. >> But if I run e.g. tensorflow to see the GPU with the following code: >> >> from tensorflow.python.client import device_lib >> >> device_lib.list_local_devices() >> >> I just get the CPU device: >> >> [name: "/device:CPU:0" >> device_type: "CPU" >> memory_limit: 268435456 >> locality { >> } >> incarnation: 232115754901553261] >> >> >> Any idea what I am doing wrong? >> >> Further, I would like to limit the amount of GPUs ( It is just a test >> environment with one GPU we have more ). I tried the following which >> doesn't seem to have an effect: >> >> - Add the following config in values.yaml in any combination possible: >> >> extraConfig: | >> c.Spawner.notebook_dir = '/home/dbvis' >> c.Spawner.extra_resource_limits: {'nvidia.com/gpu': '0'} >> c.Spawner.extra_resource_guarantees: {'nvidia.com/gpu': '0'} >> c.Spawner.args = ['--device=/dev/nvidiactl', >> '--device=/dev/nvidia-uvm', '--device=/dev/nvidia-uvm-tools', >> '/dev/nvidia0' ] >> >> - Add the GPU to the resources in the singleuser configuration in >> values.yaml: >> >> singleuser: >> image: >> name: benne4444/dbvis-singleuser >> tag: test3 >> nvidia.com/gpu: >> limit: 1 >> guarantee: 1 >> >> Is what I am trying even possible right now? >> >> Further information: >> >> I set up a server running >> >> - Ubuntu 18.04.1 LTS >> - docker-nvidia >> - helm jupyterhub version 0.8-ea0cf9a >> >> I added the complete values.yaml. >> >> If you need additional information please let me know. Any help is >> appreciated a lot. >> >> Thank you, >> Benedikt >> >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Project Jupyter" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/jupyter/585d4d0b-5d8d-4cf2-b109-2c16f93d1f62%40googlegroups.com >> >> <https://groups.google.com/d/msgid/jupyter/585d4d0b-5d8d-4cf2-b109-2c16f93d1f62%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "Project Jupyter" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/226b0de9-e969-4607-82bd-6241350bf346%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
