Re: [Online Meeting Invitation]DolphinScheduler refactor meeting(Second)

wu shaoj Wed, 18 Nov 2020 23:21:12 -0800

Cool, excellent meeting


From: boyi <[email protected]>
Date: Thursday, November 19, 2020 at 15:18
To: [email protected] <[email protected]>
Subject: [Online Meeting Invitation]DolphinScheduler refactor meeting(Second)
[Online Meeting Invitation]DolphinScheduler refactor meeting(Second)


Hi, DolphinScheduler Community:




We discussed the DolphinScheduler reconstruction workflow definition storage 
structure (split JSON data) at 2020-11-17 19:00 Beijing time. A total of 10+ 
partners participated in this meeting. The discussion information of the 
meeting is as follows:


1: Currently, we provide an idea of splitting the workflow definition table 
(t_ds_process_definition) into two tables. The data structure information is as 
follows:


t_ds_process_definition[subject table] and t_ds_process_definition_task [task 
detail table]


|  t_ds_process_definition |
| name | type | describe |
| id | int(11) | id |
| name | varchar(255) | process definition name |
| version | int(11) | process definition version |
| release_state | tinyint(4) | 0 is not online, 1 is online |
| project_id | int(11) | project id |
| user_id | int(11) | user id |
| description | text | description |
| global_params | text | global params |
| flag | tinyint(4) | 0 is not available, 1 is available |
| receivers | text | addressee |
| receivers_cc | text | CC person |
| create_time | datetime | create time |
| timeout | int(11) | time out |
| tenant_id | int(11) | tenant id |
| update_time | datetime | update time |
| modify_by | varchar(36) | modify by user name |


| t_ds_process_definition_task  |
| name | type | describe |
| id | int(11) | task id |
| name | varchar(255) | task name |
| type | varchar(64) | type  [SHLL,PYTHON,DATAX,SPARK 等等 ] |
| process_definition_id | int(11) | process definition id |
| params | longtext | custom parameters  [JSON ] |
| description | text | description |
| runFlag | tinyint(4) | operation identification |
| conditionResult | longtext | conditional branch [JSON ] |
| dependence | longtext | task dependency [JSON ] |
| maxRetryTimes | tinyint(4) | max retry times |
| retryInterval | tinyint(4) | retry interval |
| timeout | varchar(128) | time out [JSON ] |
| taskInstancePriority | varchar(16) | task priority |
| workerGroup | varchar(64) | worker group name |
| preTasks | varchar(128) | pre task |
| locations | text | dag location |
| connects | text | dag connect |
| resource | varchar(255) | resouce mark |
| datasource | varchar(255) | datasource mark |




2: Consider whether you need a third table to store the dependencies between 
tasks, mainly for workflow dependent nodes, condition judgments, and task 
bloodlines for higher-level abstraction. Split into the third table.






3. Identify the issues to be discussed in the next meeting
     3.1. Is the workflow definition table split into two tables or into three 
tables?
     3.2. How to store data sources and resource files
     3.3. How to store workflow task instances without affecting re-running, 
and to support existing functions such as editing workflow definitions.
     3.4. Workflow definition version issue


We are very grateful to the following friends for their discussions: dailidong, 
lgcareer, CalvinKirs, Rubik-W, leonbao, zixi0825, JinyLeeChina, chenxingchun, 
BoYiZhang, etc. They provided a lot of effective suggestions for this meeting.


At the same time, the community also hopes that more people can participate. 
Thank you very much.


Best  wishes！
BoYiZhang






------------------------------------------------


hi，DolphinScheduler 社区：


我们在北京时间2020-11-17 
19:00针对DolphinScheduler重构工作流定义存储结构(拆大JSON)进行了讨论，共有10+位伙伴参与了本次会议，会议讨论信息如下：


1:目前提供一个将工作流定义表(t_ds_process_definition)拆分为两张表的思路,数据结构信息如下 :
t_ds_process_definition[主体表] 和 t_ds_process_definition_task [任务详情表]






2:考虑是否需要第三张表存储任务之间的依赖关系, 主要针对工作流依赖节点,条件判断,任务血缘进行更高层级的抽象.拆分成第三张表.


|  t_ds_process_definition[主体表]  |
| 序号 | 字段 | 类型 | 描述 |
| 1 | id | int(11) | 主键 |
| 2 | name | varchar(255) | 流程定义名称 |
| 3 | version | int(11) | 流程定义版本 |
| 4 | release_state | tinyint(4) | 流程定义的发布状态：0 未上线 , 1已上线 |
| 5 | project_id | int(11) | 项目id |
| 6 | user_id | int(11) | 流程定义所属用户id |
| 7 | description | text | 流程定义描述 |
| 8 | global_params | text | 全局参数 |
| 9 | flag | tinyint(4) | 流程是否可用：0 不可用，1 可用 |
| 10 | receivers | text | 收件人 |
| 11 | receivers_cc | text | 抄送人 |
| 12 | create_time | datetime | 创建时间 |
| 13 | timeout | int(11) | 超时时间 |
| 14 | tenant_id | int(11) | 租户id |
| 15 | update_time | datetime | 更新时间 |
| 16 | modify_by | varchar(36) | 修改用户 |


| t_ds_process_definition_task [任务详情表] |
| 序号 | 参数名 | 类型 | 描述 |
| 1 | id | int(11) | 任务id |
| 2 | name | varchar(255) | 任务名称 |
| 3 | type | varchar(64) | 类型 [SHLL,PYTHON,DATAX,SPARK 等等 ] |
| 4 | process_definition_id | int(11) | 流程定义id |
| 5 | params | longtext | 自定义参数 [ Json 格式 保存原有的params字段,  自定义参数和资源文件参数是否拆出 ] |
| 6 | description | text | 描述 |
| 7 | runFlag | tinyint(4) | 运行标识 |
| 8 | conditionResult | longtext | 条件分支 [JSON格式] |
| 9 | dependence | longtext | 任务依赖  [JSON格式] |
| 10 | maxRetryTimes | tinyint(4) | 最大重试次数 |
| 11 | retryInterval | tinyint(4) | 重试间隔 |
| 12 | timeout | varchar(128) | 超时控制策略 [JSON格式] |
| 13 | taskInstancePriority | varchar(16) | 任务优先级 |
| 14 | workerGroup | varchar(64) | Worker 分组名称 |
| 15 | preTasks | varchar(128) | 前置任务 |
| 16 | locations | text | 节点坐标信息 |
| 17 | connects | text | 节点连线信息 |
| 18 | resource | varchar(255) | 资源文件标识 , 以逗号分隔 |
| 19 | datasource | varchar(255) | 数据源标识 , 以逗号分隔 |


3.明确下次会议待讨论问题
    3.1. 工作流定义表是拆成两张表还是拆成三张表?
    3.2. 数据源和资源文件如何存储
    3.3. 工作流任务实例如何存储,不影响重跑  ,要支持编辑工作流定义等现有的功能.
    3.4. 工作流定义版本问题




我们很感谢以下朋友的讨论：dailidong、lgcareer、CalvinKirs、Rubik-W、leonbao、zixi0825、JinyLeeChina、chenxingchun
 、 BoYiZhang 等，他们对本次会议提供了很多有效的建议。


同时社区也希望更多的人能够参与进来。非常感谢你们。


--------------------------------------
BoYi ZhangE-mail : [email protected]

Re: [Online Meeting Invitation]DolphinScheduler refactor meeting(Second)

Reply via email to