[GitHub] [dolphinscheduler] jieguangzhou opened a new issue, #9724: [Discussion][Machine Learning] Support AI task and the open source project about MLops

GitBox Sun, 24 Apr 2022 07:26:09 -0700


jieguangzhou opened a new issue, #9724:
URL: https://github.com/apache/dolphinscheduler/issues/9724


   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   I have seen a Machine Learning Platform post on Medium. The post talk about 
Lizhi Machine Learning Platform&Apache DolphinScheduler.
   
https://medium.com/@DolphinScheduler/a-formidable-combination-of-lizhi-machine-learning-platform-dolphinscheduler-creates-new-paradigm-e445938f1af
   
   Like this, I have a try to do something like this.
   Figure 1 shows the training workflow startup screen
   <img width="496" alt="image" 
src="https://user-images.githubusercontent.com/31528124/164979694-b55493e8-c8e7-4fda-9489-5d23b929cf05.png";>
   In this workflow, I implemented four algorithms (SVM, LR, LGBM, XGboost) 
using the API of Sklearn, Lightgbm, and Xgboost.
   Every algorithm's parameters can fill in the value of key "params". In this 
case, the parameters of LGBM is "n_estimators=200;num_leaves=20".
   
   The experiment tracking module is supported by MLFlow.The picture below 
shows the report of the experiment.
   
![image](https://user-images.githubusercontent.com/31528124/164980165-41b03a61-e830-4bfd-9138-2a4335a67f7d.png)
   I register the model every time I run it.
   
![image](https://user-images.githubusercontent.com/31528124/164980244-74ed3f86-d9d2-4f82-9263-daa4ab5ddc3a.png)
   
   
   When the model is trained, run the deployment workflow. Like this:
   
   <img width="495" alt="image" 
src="https://user-images.githubusercontent.com/31528124/164980301-451e8fa1-c105-493b-9415-5c292fb1c3de.png";>
   
   We can deploy the version 2 model to the k8s cluster.
   
   <img width="500" alt="image" 
src="https://user-images.githubusercontent.com/31528124/164980398-35dd3ec8-17b9-4603-b57e-ebb0c666b7d2.png";>
   
   And then we can see the deployment and pods
   
![image](https://user-images.githubusercontent.com/31528124/164980439-bb6634fd-e777-4aa2-927e-948c83cb6006.png)
   
   At the same time, we can access the service through the interface.
   
![image](https://user-images.githubusercontent.com/31528124/164980478-b84c088a-1980-4fe1-8a29-2a728b587599.png)
   
   
   BTW, we can also connect the training workflow with the deployment workflow 
as a sub-workflow, like this.
   
   <img width="613" alt="image" 
src="https://user-images.githubusercontent.com/31528124/164980627-63517bcd-aca0-4e74-b16e-0113381ef279.png";>
   
   
   
   ### What you expected to happen
   
   None
   
   ### How to reproduce
   
    None
   
   ### Anything else
   
   The above workflow is based on the Shell task. But it is too complex to ml 
engineer. I hope to write new types of tasks that make them easier for users to 
use.
   
   
   The training workflow contains one task. The code is as follows
   `data_path=${data_path}
   export MLFLOW_TRACKING_URI=${MLFLOW_TRACKING_URI}
   echo $data_path
   repo=https://github.com/jieguangzhou/mlflow_sklearn_gallery.git
   mlflow run $repo -P algorithm=${algorithm} -P data_path=$data_path -P 
params="${params}" -P param_file=${param_file} -P model_name=${model_name} 
--experiment-name=${experiment_name}
   
   echo "training finish"`
   
   The deployment workflow contains two task.
   <img width="639" alt="image" 
src="https://user-images.githubusercontent.com/31528124/164980832-fd6a524e-daeb-4e93-a62d-3fcbcfe3a74b.png";>
   
   The code of "build docker" workflow is as follows
   `eval $(minikub -p minikube docker-env)
   export MLFLOW_TRACKING_URI=${MLFLOW_TRACKING_URI}
   image_name=mlflow/${model_name}:${version}
   echo $image_name
   mlflow models build-docker -m "models:/${model_name}/${version}" -n 
$image_name --enable-mlserver`
   
   The code of the "create deployment" workflow which deploys the model to the 
k8s cluster is as follows
   `version_lower=$(echo "${version}" | tr '[:upper:]' '[:lower:]')
   kubectl apply -f - << END
   apiVersion: apps/v1
   kind: Deployment
   metadata:
     name: mlflow-${model_name}-$version_lower
   spec:
     selector:
       matchLabels:
         app: mlflow
     replicas: 3 # tells deployment to run 2 pods matching the template
     template:
       metadata:
         labels:
           app: mlflow
       spec:
         containers:
         - name: mlflow-iris
           image: mlflow/${model_name}:${version}
           imagePullPolicy: IfNotPresent
           ports:
           - containerPort: 8080
   
   ---
   apiVersion: v1
   kind: Service
   metadata:
     name: mlflow-${model_name}-$version_lower
   spec:
     ports:
     - port: 8080
       targetPort: 8080
     selector:
       app: mlflow
   END
   
   sleep 5s
   
   kubectl port-forward deployment/mlflow-${model_name}-$version_lower 
${deployment_port}:8080`
   
   ### Version
   
   dev
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [dolphinscheduler] jieguangzhou opened a new issue, #9724: [Discussion][Machine Learning] Support AI task and the open source project about MLops

Reply via email to