cdmikechen opened a new issue #895:
URL: https://github.com/apache/submarine/issues/895


   At present, the juypter image size is very large, so that when users deploy 
juypter service in a new k8s cluster or node, there will be a long waiting 
process.
   This issue is mainly to discuss the design idea of building an operator 
based on CRD that can connect with existing submarine services and has certain 
controllability / predictability. Based on a new CRD, we can automatically call 
the image pull action in every suitable node before the juypter service is 
deployed, so that every node in k8s has the corresponding image.
   
   In this case, we need to create a CRD which contains a list of images to be 
obtained, the refresh time, and the pull secret key of each image (if 
necessary). Examples of CRD are as follows:
   ```yaml
   apiVersion: org.apache.submarine/v1
   kind: JupyterImagePuller
   metadata:
     name: example-image-puller
     namespace: submarine
   spec:
     images: # the list of images to pre-pull
       - name: jupyter # environment name
         image: apache/submarine:jupyter-notebook-0.7.0 # image name
       - name: jupyter-gpu
         image: xxx.harbor.com/5000/apache/submarine:jupyter-notebook-gpu-0.7.0
         auth: # docker registry authentication
           username: xxxx
           password: xxxx
           email: [email protected] # Optional
       - name: jupyter 
         image: apache/submarine:jupyter-notebook-0.7.0
         auth: 
           secret: xxxx # If there is already a specified secret, we can fill 
in the secret name 
     refreshHours: '2' # number of hours between health checks
     nodeSelector: {} # node selector applied to pods created by the daemonset
   ```
   
   Every time submarine updates the environments, it will update the image list 
in CRD. After reading the spec of CRD and triggering the addition / 
modification, the operator can create a `DaemonSet` in the specified namespace 
(with nodeSelector). The `DaemonSet` will contain N (images list size) 
containers which can pull every image by CRD. 
   This operation will modify the entrypiont script in the docker image and 
output words like "Pulling complete", so it's a lightweight task.
   ```yaml
   spec:
    initContainers:
       - name: image-pull-{image-name}
         command:
           - /bin/sh
           - -c
           - echo "Pulling complete"
   ```
   
   ---
   There are still some contents to be designed, which will be explained later.
   - TODO 1: Design of docker image version update strategy
   - TODO 2: Design of repeated strategy for some submarine common basic 
jupyter/jupyter-gpu images in multi tenant scenario
   - TODO 3: Design of docker image registry authorization


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to