+1
------------------ ???????? ------------------
??????:
"dev"
<[email protected]>;
????????: 2020??11??27??(??????) ????3:53
??????: "dev"<[email protected]>;
????: Re: [DISCUSS] Process definithon json split design
There are two solutions for the query timing of job details:
- The Master is only responsible for DAG scheduling. When a job is
executed, it sends the job code to the Worker, and the Worker is
responsible for querying the job details and executing the job
- The Master executes DAG scheduling, queries the job details when a job is
executed, and then sends it to the Worker to execute the job
---------------------------------------------------------------------------------------------------------
????????????????????????????????????????
-
Master??????DAG??????????????????????????????????????Worker??Worker??????????????????????????
- Master????DAG??????????????????????????????????????????????Worker????????
--------------------
DolphinScheduler(Incubator) Commtter
Hemin Wen ??????
[email protected]
--------------------
Hemin Wen <[email protected]> ??2020??11??25?????? ????10:01??????
> Hi!
>
> About json splitting of workflow definition, The following is the design
> plan for splitting three tables.
>
> Everyone can discuss together.
>
>
>
--------------------------------------------------------------------------------------------------------------
>
> ## 1. Currently
> The workflow definition of the current DS system includes task definition
> data and task relationship data. In the design of the database, task data
> and task relationship data are stored in the workflow as a string type
> field (process_definition_json) Definition table (t_ds_process_definition).
>
> With the increase of workflow and tasks, the following problems will arise:
>
> -Task data, relational data and workflow data are coupled together, which
> is not friendly to the scenario of single-task scheduling. The task must be
> created in the workflow
>
> -The task cannot be reused because the task is created in the workflow
>
> -The maintenance cost is high. If you move the whole body and modify any
> task, you need to update the data in the workflow as a whole, and it also
> increases the log cost
>
> -When there are many tasks in the workflow, the efficiency of global
> search and statistical analysis is low, such as querying which tasks use
> which data source
>
> -Poor scalability, for example, the realization of blood relationship
> function in the future will only lead to more and more bloated workflow
> definitions
>
> -Tasks, relationships, and workflow boundaries are blurred. Condition
> nodes and delay nodes are also regarded as a task, which is actually a
> combination of relationships and conditions
>
> Based on the above pain points, we need to redefine the business
> boundaries of tasks, relationships, and workflows, and redesign their data
> structures based on this
>
> ## 2. Design Ideas
>
> ### 2.1 Workflow, relation, job
>
> First of all, we set aside the current implementation and clarify the
> business boundaries of tasks (the subsequent description is changed to
> jobs), relationships, and workflows, and how to decouple
>
> -Job: the task that the scheduling system really needs to execute, the job
> only contains the data and resources needed to execute the job
> -relation: the relationship between the job and the job and the execution
> conditions, including the execution relationship (after A completes,
> execute B) and execution conditions (after A completes and succeeds,
> execute B; after A completes and fails, execute C; A completes 30 After
> minutes, execute D)
> -Workflow: the carrier of a set of relationships, the workflow only saves
> the relationships between jobs (DAG is a display form of workflow, a way to
> create relationships)
>
> Combined with the functions supported by the current DS, we can make a
> classification
>
> -Job: Dependency check, sub-process, Shell, stored procedure, Sql, Spark,
> Flink, MR, Python, Http, DataX, Sqoop
> -Relationship: serial execution, parallel execution, aggregate execution,
> conditional branch, delayed execution
> -Workflow: the boundary of scheduling execution, including a set of
> relationships
>
> #### 2.1.1 Further refinement
>
> The job definition data is not much different from the current job
> definition data. Both are composed of public fields and custom fields. You
> only need to remove the fields related to the relationship.
>
> The workflow definition data is not much different from the current
> workflow definition data, just remove the json field.
>
> Relational data, we can abstract into two nodes and one path according to
> classification. The node is the job, and the path includes the conditional
> rules that need to be met from the pre-node to the post-node. The
> conditional rules include: unconditional, judgment condition, and delay
> condition.
>
> ### 2.2 Version Management
>
> We clarify the business boundaries. After decoupling, they become a
> reference relationship. The workflow and the relationship are one-to-many,
> and the relationship and the job are one-to-many. Not only is the
> definition of data, we also need to consider instance data. Every time a
> workflow is scheduled and executed, a workflow instance will be generated.
> Jobs and workflows can be changed, and the workflow instance must support
> viewing, rerun, recovery failure, etc. . This requires the introduction of
> version management of the definition data. Every time workflow,
> relationship, and job changes need to save old version data and generate
> new version data.
>
> So the design idea here is:
>
> To define data, you need to add a version field
>
> The definition table needs to add the corresponding log table
>
> When creating definition data, double write to the definition table and
> log table. When modifying the definition data, save the modified version to
> the log table
>
> There is no need to save version information in the reference data of the
> definition table (refer to the latest version), and the version information
> at the time of execution is saved in the instance data
>
> ### 2.3 Coding Design
>
> This also involves the import and export of workflow and job definition
> data. According to the previous community discussion, a coding scheme needs
> to be introduced. Each piece of data in workflow, relationship, and job
> will have a unique code. Related Issues: https://github
> .com/apache/incubator-dolphinscheduler/issues/3820
>
> Resource: RESOURCE_xxx
>
> Task: TASK_xxx
>
> Relation: RELATION_xxx
>
> Workflow: PROCESS_xxx
>
> Project: PROJECT_xxx
>
> ## 3. Design plan
>
> ### 3.1 Table model design
>
> #### 3.1.1 Job definition table: t_ds_task_definithon
>
> | Column Name | Description |
> | ----------------------- | -------------- |
> | id | Self-incrementing ID |
> | union_code | unique code |
> | version | Version |
> | name | Job name |
> | description | description |
> | task_type | Job type |
> | task_params | Job custom parameters |
> | run_flag | Run flag |
> | task_priority | Job priority |
> | worker_group | worker group |
> | fail_retry_times | Number of failed retries |
> | fail_retry_interval | Failure retry interval |
> | timeout_flag | Timeout flag |
> | timeout_notify_strategy | Timeout notification strategy |
> | timeout_duration | Timeout duration |
> | create_time | Creation time |
> | update_time | Modification time |
>
> #### 3.1.2 Task relation table: t_ds_task_relation
>
> | Column Name | Description |
> | ----------------------- | ------------------------- ------------- |
> | id | Self-incrementing ID |
> | union_code | unique code |
> | version | Version |
> | process_definition_code | Workflow coding |
> | node_code | Node code (workflow code/job code) |
> | post_node_code | Post node code (workflow code/job code) |
> | condition_type | Condition type 0: None 1: Judgment condition 2: Delay
> condition |
> | condition_params | Condition parameters |
> | create_time | Creation time |
> | update_time | Modification time |
>
> #### 3.1.3 Workflow definition table: t_ds_process_definithon
>
> | Column Name | Description |
> | ---- | ---- |
> | id | Self-incrementing ID |
> | union_code | unique code |
> | version | Version |
> | name | Workflow name |
> | project_code | Project code |
> | release_state | Release state |
> | user_id | Owning user ID |
> | description | description |
> | global_params | Global parameters |
> | flag | Whether the process is available: 0 is not available, 1 is
> available |
> | receivers | recipients |
> | receivers_cc | CC |
> | timeout | Timeout time |
> | tenant_id | tenant ID |
> | create_time | Creation time |
> | update_time | Modification time |
>
> #### 3.1.4 Job definition log table: t_ds_task_definithon_log
>
> Add operation type (add, modify, delete), operator, and operation time
> based on the job definition table
>
> #### 3.1.5 Job relation log table: t_ds_task_relation_log
>
> Add operation type (add, modify, delete), operator, and operation time
> based on the job relationship table
>
> #### 3.1.6 Workflow definition log table: t_ds_process_definithon_log
>
> Add operation type (add, modify, delete), operator, and operation time
> based on the workflow definition table
>
> ### 3.2 Frontend
>
> *The design here is just a personal idea, and the front-end help is needed
> to design the interaction*
>
> Need to add job management related functions, including: job list, job
> creation, update, delete, view details operations
>
> To create a workflow page, you need to split json into workflow definition
> data and job relationship data to the back-end API layer to save/update
>
> Workflow page, when dragging task nodes, add reference job options
>
> The conditional branch nodes and delay nodes need to be resolved into the
> conditional rule data in the relationship; conversely, the conditional rule
> data returned by the backend needs to be displayed as the corresponding
> node when querying the workflow
>
> ### 3.3 Master
>
> When the Master schedules the workflow, you need to modify <Build dag from
> json> to <Build dag from relational data>. When executing a
workflow, first
> load the relational data in full (no job data is loaded here), generate
> DAG, and traverse DAG execution , And then get the job data that needs to
> be executed
>
> Other execution processes are consistent with existing processes
>
>
>
--------------------------------------------------------------------------------------------------------------
>
> ## 1.????
>
>
????DS??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????process_definition_json??????????????????????????????t_ds_process_definition??????
>
> ????????????????????????????????????????
>
> -
??????????????????????????????????????????????????????????????????????????????????????????
>
> - ????????????????????????????????????????
>
> -
????????????????????????????????????????????????????????????????????????????????????????????
>
> -
??????????????????????????????????????????????????????????????????????????????
>
> - ??????????????????????????????????????????????????????????????
>
> -
??????????????????????????????????????????????????????????????????????????????????????
>
>
????????????????????????????????????????????????????????????????????????????????????????
>
> ## 2.????????
>
> ### 2.1 ??????????????????
>
>
????????????????????????????????????????????????????????????????????????????????????????????
>
> - ??????????????????????????????????????????????????????????????????
> -
>
??????????????????????????????????????????????????????A????????????B??????????????A??????????????????B??A??????????????????C??A????30????????????D??
> -
??????????????????????????????????????????????????DAG????????????????????????????????????????????
>
> ????????DS??????????????????????????????
>
> -
????????????????????????Shell????????????Sql??Spark??Flink??MR??Python??Http??DataX??Sqoop
> - ??????????????????????????????????????????????????????
> - ????????????????????????????????????
>
> #### 2.1.1 ??????????
>
>
??????????????????????????????????????????????????????????????????????????????????????????????????????????????
>
> ??????????????????????????????????????????????????????json??????????????
>
>
>
????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
>
> ### 2.2 ????????
>
>
>
??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
>
> ??????????????????????
>
> ????????????????????????
>
> ??????????????????????????
>
>
??????????????????????????????????????????????????????????????????????????????
>
>
????????????????????????????????????????????????????????????????????????????????????
>
> ### 2.3 ????????
>
>
??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????Issue??
> https://github.com/apache/incubator-dolphinscheduler/issues/3820
>
> ??????RESOURCE_xxx
>
> ??????TASK_xxx
>
> ??????RELATION_xxx
>
> ????????PROCESS_xxx
>
> ??????PROJECT_xxx
>
> ## 3.????????
>
> ### 3.1 ??????????
>
> #### 3.1.1 ????????????t_ds_task_definithon
>
> |
????
| ???? |
> | ----------------------- | -------------- |
> |
id
| ????ID |
> |
union_code
| ???????? |
> |
version
| ???? |
> |
name
| ???????? |
> |
description
| ???? |
> |
task_type
| ???????? |
> |
task_params
| ?????????????? |
> |
run_flag
| ???????? |
> |
task_priority |
?????????? |
> |
worker_group
| worker???? |
> | fail_retry_times |
???????????? |
> | fail_retry_interval | ???????????? |
> |
timeout_flag
| ???????? |
> | timeout_notify_strategy | ???????????? |
> | timeout_duration |
???????? |
> |
create_time
| ???????? |
> |
update_time
| ???????? |
>
> #### 3.1.2 ????????????t_ds_task_relation
>
> |
????
|
????
|
> | ----------------------- | -------------------------------------- |
> |
id
|
????ID
|
> |
union_code
|
????????
|
> |
version
|
????
|
> | process_definition_code |
??????????
|
> |
node_code
| ????????????????????/?????????? |
> | post_node_code |
????????????????????????/?????????? |
> | condition_type |
???????? 0???? 1?????????? 2?????????? |
> | condition_params |
????????
|
> |
create_time
|
????????
|
> |
update_time
|
????????
|
>
> #### 3.1.3 ??????????????t_ds_process_definithon
>
> | ???? | ???? |
> | ---- | ---- |
> | id |
????ID
|
> | union_code |
????????
|
> | version |
????
|
> | name |
??????????
|
> | project_code |
????????
|
> | release_state |
????????
|
> | user_id |
????????ID
|
> | description |
????
|
> | global_params |
????????
|
> | flag |
??????????????0 ????????1 ???? |
> | receivers |
??????
|
> | receivers_cc |
??????
|
> | timeout |
????????
|
> | tenant_id |
????ID
|
> | create_time |
????????
|
> | update_time |
????????
|
>
> #### 3.1.4 ????????????????t_ds_task_definithon_log
>
> ??????????????????????????????????????????????????????????????????
>
> #### 3.1.5 ????????????????t_ds_task_relation_log
>
> ??????????????????????????????????????????????????????????????????
>
> #### 3.1.6 ??????????????????t_ds_process_definithon_log
>
> ????????????????????????????????????????????????????????????????????
>
> ### 3.2 ????
>
> *??????????????????????????????????????????????????*
>
>
??????????????????????????????????????????????????????????????????????????????
>
>
??????????????????????json??????????????????????????????????????????API??????/????
>
> ????????????????????????????????????????????
>
>
????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
>
> ### 3.3 Master
>
>
>
Master????????????????????<??json????dag>??????<??????????????dag>????????????????????????????????????????????????????????????????DAG??????DAG????????????????????????????????
>
> ??????????????????????????
>
> --------------------
> DolphinScheduler(Incubator) Commtter
> Hemin Wen ??????
> [email protected]
> --------------------
>