dill21yu opened a new issue, #17640: URL: https://github.com/apache/dolphinscheduler/issues/17640
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar feature requirement. ### Description #### ๐ Background & Problem In the current DolphinScheduler implementation, querying all task definitions under a workflow definition requires joining three tables: - `t_ds_workflow_definition` - `t_ds_workflow_task_relation` - `t_ds_task_definition` This multi-table join approach introduces significant performance bottlenecks when dealing with large numbers of workflows and tasks, especially in high-frequency query scenarios such as workflow tree view rendering and status statistics. #### ๐ Optimization Goal By adding a redundant `workflow_definition_code` field to the `t_ds_task_definition` table, we can simplify the original three-table join into a single-table query, greatly improving read performance. This optimization applies to several core APIs, including: - `GET /{code}/view-tree`: Generate workflow tree structure - `POST /query-task-definition-list`: Query task definition list - Workflow release validation logic (`/{code}/release`) - Task instance state statistics (filtered by workflowCode) --- ### ๐ก Solution ##### 1. Database Schema Changes -- Add workflow_definition_code to task definition table ALTER TABLE `t_ds_task_definition` ADD COLUMN `workflow_definition_code` BIGINT(20) NOT NULL DEFAULT 0 COMMENT 'Workflow definition code' AFTER `project_code`; -- Add fields to log table ALTER TABLE `t_ds_task_definition_log` ADD COLUMN `workflow_definition_code` BIGINT(20) NOT NULL DEFAULT 0 COMMENT 'Workflow definition code' AFTER `project_code`, ADD COLUMN `workflow_definition_version` INT(11) NOT NULL DEFAULT 0 COMMENT 'Workflow definition version' AFTER `workflow_definition_code`; ##### 2. Data Migration Scripts -- Populate task definition table from relation table UPDATE t_ds_task_definition td JOIN t_ds_workflow_task_relation wtr ON td.code = wtr.post_task_code AND td.version = wtr.post_task_version SET td.workflow_definition_code = wtr.workflow_definition_code WHERE td.workflow_definition_code = 0; -- Populate log table UPDATE t_ds_task_definition_log tdl JOIN t_ds_workflow_task_relation_log wtrl ON tdl.code = wtrl.post_task_code AND tdl.version = wtrl.post_task_version SET tdl.workflow_definition_code = wtrl.workflow_definition_code, tdl.workflow_definition_version = wtrl.workflow_definition_version WHERE tdl.workflow_definition_code = 0; ##### 3. Code-Level Changes (1) Entity Updates // TaskDefinition.java private long workflowDefinitionCode; // TaskDefinitionLog.java private long workflowDefinitionCode; private int workflowDefinitionVersion; (2) DAO Query Optimization (Key Change) List<WorkflowTaskRelationLog> relations = relationMapper.queryByWorkflowCodeAndVersion(...); Set<TaskDefinition> codes = relations.stream().map(...).collect(...); return taskDefinitionLogMapper.queryByTaskDefinitions(codes); After (single-table query): List<TaskDefinition> list = taskDefinitionMapper.queryByWorkflowDefinitionCode(workflowCode); Corresponding SQL: <select id="queryByWorkflowDefinitionCode" resultType="TaskDefinition"> SELECT <include refid="baseSql"/> FROM t_ds_task_definition WHERE workflow_definition_code = #{workflowDefinitionCode} </select> (3) Affected Core Features Module | API / Operation | Impacted -- | -- | -- Workflow Save/Update | Create/Update Task Definitions | โ Need to populate field Release Validation | /releaseย check sub-workflows | โ Query performance improved View Rendering | /{code}/view-tree | โ Faster query Task Query | POST /query-task-definition-list | โ Accelerated Statistics API | /v2/statistics/tasks/states/count | โ Optimizable Python Gateway | queryByName | โ Optimizable </body> </html> ##### 4. Business Logic Adaptation The following operations must ensure workflow_definition_code is set when saving or updating task definitions: Workflow creation & update Import/export Version rollback Copy functionality ๐ Expected Impact Significantly reduce database query load Improve frontend page loading speed (especially for large workflows) Enhance user experience and support larger-scale scheduling scenarios ### Are you willing to submit a PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
