(yunikorn-site) branch master updated: [YUNIKORN-1968] Update "Run NVIDIA GPU Jobs in workload" (#331)

wilfreds Mon, 13 Nov 2023 00:22:56 -0800

This is an automated email from the ASF dual-hosted git repository.

wilfreds pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/yunikorn-site.git



The following commit(s) were added to refs/heads/master by this push:
     new 323bd21a33 [YUNIKORN-1968] Update "Run NVIDIA GPU Jobs in workload" 
(#331)
323bd21a33 is described below

commit 323bd21a337c77b2e76de40595896f77941b732b
Author: KatLantyss <[email protected]>
AuthorDate: Mon Nov 13 19:22:21 2023 +1100

    [YUNIKORN-1968] Update "Run NVIDIA GPU Jobs in workload" (#331)
    
    Closes: #331
    
    Signed-off-by: Wilfred Spiegelenburg <[email protected]>
---
 docs/user_guide/workloads/run_nvidia.md | 51 +++++++++++++++++++--------------
 1 file changed, 29 insertions(+), 22 deletions(-)

diff --git a/docs/user_guide/workloads/run_nvidia.md 
b/docs/user_guide/workloads/run_nvidia.md
index 644910851f..b75648851c 100644
--- a/docs/user_guide/workloads/run_nvidia.md
+++ b/docs/user_guide/workloads/run_nvidia.md
@@ -26,10 +26,10 @@ under the License.
 -->
 
 ## Yunikorn with NVIDIA GPUs
-This guide gives an overview of how to set up NVIDIA Device Plugin which 
enable user to run GPUs with Yunikorn, for more details please check 
[**Kubernetes with 
GPUs**](https://docs.nvidia.com/datacenter/cloud-native/kubernetes/install-k8s.html#option-2-installing-kubernetes-using-kubeadm).
+This guide gives an overview of how to set up NVIDIA Device Plugin which 
enable user to run GPUs with Yunikorn, for more details please check [**NVIDIA 
device plugin for 
Kubernetes**](https://github.com/NVIDIA/k8s-device-plugin#nvidia-device-plugin-for-kubernetes).
 
 ### Prerequisite
-Before following the steps below, Yunikorn need to deploy on the [**Kubernetes 
with 
GPUs**](https://docs.nvidia.com/datacenter/cloud-native/kubernetes/install-k8s.html#install-kubernetes).
+Before following the steps below, Yunikorn need to deploy on the Kubernetes 
with GPUs.
 
 ### Install NVIDIA Device Plugin
 Add the nvidia-device-plugin helm repository.
@@ -42,14 +42,17 @@ helm repo list
 Verify the latest release version of the plugin is available.
 ```
 helm search repo nvdp --devel
-NAME                             CHART VERSION  APP VERSION       DESCRIPTION
-nvdp/nvidia-device-plugin        0.12.3         0.12.3         A Helm chart 
for ...
+NAME                             CHART VERSION  APP VERSION     DESCRIPTION
+nvdp/nvidia-device-plugin        0.14.1         0.14.1          A Helm chart 
for ...
 ```
 
 Deploy the device plugin
 ```
 kubectl create namespace nvidia
-helm install --generate-name nvdp/nvidia-device-plugin --namespace nvidia 
--version 0.12.3
+helm install nvidia-device-plugin nvdp/nvidia-device-plugin \
+  --namespace nvidia \
+  --create-namespace \
+  --version 0.14.1
 ```
 
 Check the status of the pods to ensure NVIDIA device plugin is running
@@ -65,25 +68,29 @@ kube-system    kube-apiserver-katlantyss-nzxt            
1/1     Running   4 (11
 kube-system    kube-controller-manager-katlantyss-nzxt   1/1     Running   3 
(11h ago)   11h
 kube-system    kube-proxy-4wz7r                          1/1     Running   1 
(11h ago)   11h
 kube-system    kube-scheduler-katlantyss-nzxt            1/1     Running   4 
(11h ago)   11h
-kube-system    nvidia-device-plugin-1659451060-c92sb     1/1     Running   1 
(11h ago)   11h
+nvidia         nvidia-device-plugin-1659451060-c92sb     1/1     Running   1 
(11h ago)   11h
 ```
 
 ### Testing NVIDIA Device Plugin
 Create a gpu test yaml file.
-```
+```yaml
 # gpu-pod.yaml
-       apiVersion: v1
-       kind: Pod
-       metadata:
-         name: gpu-operator-test
-       spec:
-         restartPolicy: OnFailure
-         containers:
-         - name: cuda-vector-add
-           image: "nvidia/samples:vectoradd-cuda10.2"
-           resources:
-             limits:
-                nvidia.com/gpu: 1
+apiVersion: v1
+kind: Pod
+metadata:
+  name: gpu-pod
+spec:
+  restartPolicy: Never
+  containers:
+    - name: cuda-container
+      image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubi8
+      resources:
+        limits:
+          nvidia.com/gpu: 1 #requesting 1 GPU
+  tolerations:
+  - key: nvidia.com/gpu
+    operator: Exists
+    effect: NoSchedule
 ```
 Deploy the application.
 ```
@@ -91,14 +98,14 @@ kubectl apply -f gpu-pod.yaml
 ```
 Check the logs to ensure the app completed successfully.
 ```
-kubectl get pods gpu-operator-test
+kubectl get pod gpu-pod
 
 NAME                READY   STATUS      RESTARTS   AGE
-gpu-operator-test   0/1     Completed   0          9d
+gpu-pod   0/1     Completed   0          9d
 ```
 Check the result.
 ```
-kubectl logs gpu-operator-test
+kubectl logs gpu-pod
        
 [Vector addition of 50000 elements]
 Copy input data from the host memory to the CUDA device


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(yunikorn-site) branch master updated: [YUNIKORN-1968] Update "Run NVIDIA GPU Jobs in workload" (#331)

Reply via email to