zhongjiajie commented on code in PR #10036: URL: https://github.com/apache/dolphinscheduler/pull/10036#discussion_r873163428
########## docs/docs/en/guide/task/mlflow.md: ########## @@ -0,0 +1,117 @@ +# MLflow Node + +## Overview + +[MLflow](https://mlflow.org) is an excellent open source platform to manage the ML lifecycle, including experimentation, +reproducibility, deployment, and a central model registry. + +Mlflow task is used to perform mlflow project tasks, which include basic algorithmic and autoML capabilities ( +User-defined MLFlow project task execution will be supported in the near future) + +## Create Task + +- Click `Project -> Management-Project -> Name-Workflow Definition`, and click the "Create Workflow" button to enter the + DAG editing page. +- Drag from the toolbar <img src="/img/tasks/icons/mlflow.png" width="15"/> task node to canvas. + +## Task Parameter + +- DolphinScheduler common parameters + - **Node name**: The node name in a workflow definition is unique. + - **Run flag**: Identifies whether this node schedules normally, if it does not need to execute, select + the `prohibition execution`. + - **Descriptive information**: Describe the function of the node. + - **Task priority**: When the number of worker threads is insufficient, execute in the order of priority from high + to low, and tasks with the same priority will execute in a first-in first-out order. + - **Worker grouping**: Assign tasks to the machines of the worker group to execute. If `Default` is selected, + randomly select a worker machine for execution. + - **Environment Name**: Configure the environment name in which run the script. + - **Times of failed retry attempts**: The number of times the task failed to resubmit. + - **Failed retry interval**: The time interval (unit minute) for resubmitting the task after a failed task. + - **Delayed execution time**: The time (unit minute) that a task delays in execution. + - **Timeout alarm**: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm + email will send and the task execution will fail. + - **Custom parameter**: It is a local user-defined parameter for mlflow, and will replace the content + with `${variable}` in the script. + - **Predecessor task**: Selecting a predecessor task for the current task, will set the selected predecessor task as + upstream of the current task. + +- MLflow task specific parameters + - **mlflow server tracking uri** :MLflow server uri, default http://127.0.0.1:5000. + - **experiment name** :The experiment in which the task is running, if none, is created. + - **register model** :Register the model or not. If register is selected, the following parameters are expanded. + - **model name** : The registered model name is added to the original model version and registered as + Production. + - **job type** : The type of task to run, currently including the underlying algorithm and AutoML. (User-defined + MLFlow project task execution will be supported in the near future) + - BasicAlgorithm specific parameters + - **algorithm** :The selected algorithm currently supports `LR`, `SVM`, `LightGBM` and `XGboost` based + on [scikit-learn](https://scikit-learn.org/) form. + - **Parameter search space** : Parameter search space when running the corresponding algorithm, which can be + empty. For example, the parameter `max_depth=[5, 10];n_estimators=[100, 200]` for lightgbm 。The convention + will be passed with '; 'shards each parameter, using the name before the equal sign as the parameter name, + and using the name after the equal sign to get the corresponding parameter value through `python eval()`. + - AutoML specific parameters + - **AutoML tool** : The AutoML tool used, currently + supports [autosklearn](https://github.com/automl/auto-sklearn) + and [flaml](https://github.com/microsoft/FLAML) + - Parameters common to BasicAlgorithm and AutoML + - **data path** : The absolute path of the file or folder. Ends with .csv for file or contain train.csv and + test.csv for folder(In the suggested way, users should build their own test sets for model evaluation)。 + - **parameters** : Parameter when initializing the algorithm/AutoML model, which can be empty. For example + parameters `"time_budget=30;estimator_list=['lgbm']"` for flaml 。The convention will be passed with '; 'shards + each parameter, using the name before the equal sign as the parameter name, and using the name after the equal + sign to get the corresponding parameter value through `python eval()`. + - BasicAlgorithm + - [lr](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression) + - [SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html?highlight=svc#sklearn.svm.SVC) + - [lightgbm](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm.LGBMClassifier) + - [xgboost](https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBClassifier) + - AutoML + - [flaml](https://microsoft.github.io/FLAML/docs/reference/automl#automl-objects) + - [autosklearn](https://automl.github.io/auto-sklearn/master/api.html) + +## Task Example + +### Preparation + +#### Conda env + +You need to enter the admin account to configure a conda environment variable(Please +install [anaconda](https://docs.continuum.io/anaconda/install/) +or [miniconda](https://docs.conda.io/en/latest/miniconda.html#installing ) in advance ) + + + +Note During the configuration task, select the conda environment created above. Otherwise, the program cannot find the +Conda environment. + + + +#### Start the mlflow service + +Make sure you have installed MLflow, using 'PIP Install MLFlow'. + +Create a folder where you want to save your experiments and models and start mlFlow service. + +``` Review Comment: ```suggestion ```sh ``` ########## docs/docs/zh/guide/task/mlflow.md: ########## @@ -0,0 +1,97 @@ +# MLflow节点 + +## 综述 + +[MLflow](https://mlflow.org) 是一个MLops领域一个优秀的开源项目, 用于管理机器学习的生命周期,包括实验、可再现性、部署和中心模型注册。 + +MLflow 任务用于执行 MLflow Project 任务,其中包含了阈值的基础算法能力与AutoML能力(将在不久将来支持用户自定义的mlflow project任务执行)。 + +## 创建任务 + +- 点击项目管理-项目名称-工作流定义,点击“创建工作流”按钮,进入 DAG 编辑页面; +- 拖动工具栏的 <img src="/img/tasks/icons/mlflow.png" width="15"/> 任务节点到画板中。 + +## 任务参数 + +- DS通用参数 + - **节点名称** :设置任务的名称。一个工作流定义中的节点名称是唯一的。 + - **运行标志** :标识这个节点是否能正常调度,如果不需要执行,可以打开禁止执行开关。 + - **描述** :描述该节点的功能。 + - **任务优先级** :worker 线程数不足时,根据优先级从高到低依次执行,优先级一样时根据先进先出原则执行。 + - **Worker 分组** :任务分配给 worker 组的机器执行,选择 Default,会随机选择一台 worker 机执行。 + - **环境名称** :配置运行脚本的环境。 + - **失败重试次数** :任务失败重新提交的次数。 + - **失败重试间隔** :任务失败重新提交任务的时间间隔,以分钟为单位。 + - **延迟执行时间** :任务延迟执行的时间,以分钟为单位。 + - **超时告警** :勾选超时告警、超时失败,当任务超过"超时时长"后,会发送告警邮件并且任务执行失败。 + - **自定义参数** :是 mlflow 局部的用户自定义参数,会替换脚本中以 ${变量} 的内容 + - **前置任务** :选择当前任务的前置任务,会将被选择的前置任务设置为当前任务的上游。 + +- MLflow任务特定参数 + - **mlflow server tracking uri** :MLflow server 的连接, 默认 http://127.0.0.1:5000。 + - **实验名称** :任务运行时所在的实验,若无则创建。 + - **注册模型** :是否注册模型,若选择注册,则会展开以下参数。 + - **注册的模型名称** : 注册的模型名称,会在原来的基础上加上一个模型版本,并注册为Production。 + - **任务类型** : 运行的任务类型,目前包括基础算法与AutoML, 后续将会支持用户自定义的ML Project。 + - 基础算法下的特有参数 + - **算法** :选择的算法,目前基于 [scikit-learn](https://scikit-learn.org/) 形式支持 `lr`, `svm`, `lightgbm`, `xgboost`. + - **参数搜索空间** : 运行对应算法的参数搜索空间, 可为空。如针对lightgbm 的 `max_depth=[5, 10];n_estimators=[100, 200]` + 则会进行对应搜索。约定传入后会以`;`切分各个参数,等号前的名字作为参数名,等号后的名字将以python eval执行得到对应的参数值 + - AutoML下的参数下的特有参数 + - **AutoML工具** : 使用的AutoML工具,目前支持 [autosklearn](https://github.com/automl/auto-sklearn) + , [flaml](https://github.com/microsoft/FLAML) + - BasicAlgorithm 和 AutoML共有参数 + - **数据路径** : 文件/文件夹的绝对路径, 若文件需以.csv结尾(自动切分训练集与测试集), 文件夹需包含train.csv和test.csv(建议方式,用户应自行构建测试集用于模型评估)。 + - **参数** : 初始化模型/AutoML训练器时的参数,可为空, 如针对 flaml 设置`"time_budget=30;estimator_list=['lgbm']"`。约定传入后会以`;` + 切分各个参数,等号前的名字作为参数名,等号后的名字将以python eval执行得到对应的参数值。详细的参数列表如下: + - BasicAlgorithm + - [lr](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression) + - [SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html?highlight=svc#sklearn.svm.SVC) + - [lightgbm](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm.LGBMClassifier) + - [xgboost](https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBClassifier) + - AutoML + - [flaml](https://microsoft.github.io/FLAML/docs/reference/automl#automl-objects) + - [autosklearn](https://automl.github.io/auto-sklearn/master/api.html) + +## 任务样例 + +### 前置准备 + +#### conda 环境配置 + +你需要进入admin账户配置一个conda环境变量(请提前[安装anaconda](https://docs.continuum.io/anaconda/install/) +或者[安装miniconda](https://docs.conda.io/en/latest/miniconda.html#installing) ) + + + +后续注意配置任务时,环境选择上面创建的conda环境,否则程序会找不到conda环境 + + + +#### mlflow service 启动 + +确保你已经安装mlflow,可以使用`pip install mlflow`进行安装 + +在你想保存实验和模型的地方建立一个文件夹,然后启动 mlflow service + +``` Review Comment: ```suggestion ```sh ``` ########## dolphinscheduler-task-plugin/dolphinscheduler-task-mlflow/src/main/java/org/apache/dolphinscheduler/plugin/task/mlflow/MlflowConstants.java: ########## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.dolphinscheduler.plugin.task.mlflow; + +public class MlflowConstants { + private MlflowConstants() { + throw new IllegalStateException("Utility class"); + } + + public static final String JOB_TYPE_AUTOML = "AutoML"; + + public static final String JOB_TYPE_BASIC_ALGORITHM = "BasicAlgorithm"; + + public static final String PRESET_AUTOML_PROJECT = "https://github.com/jieguangzhou/MLflow-AutoML"; + + public static final String PRESET_BASIC_ALGORITHM_PROJECT = "https://github.com/jieguangzhou/mlflow_sklearn_gallery"; Review Comment: I find you use your repo in our codebase, why? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
