dill21yu opened a new issue, #17640:
URL: https://github.com/apache/dolphinscheduler/issues/17640

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar feature requirement.
   
   
   ### Description
   
   #### ๐Ÿž Background & Problem
   
   In the current DolphinScheduler implementation, querying all task 
definitions under a workflow definition requires joining three tables:
   
   - `t_ds_workflow_definition`
   - `t_ds_workflow_task_relation`
   - `t_ds_task_definition`
   
   This multi-table join approach introduces significant performance 
bottlenecks when dealing with large numbers of workflows and tasks, especially 
in high-frequency query scenarios such as workflow tree view rendering and 
status statistics.
   
   #### ๐Ÿš€ Optimization Goal
   
   By adding a redundant `workflow_definition_code` field to the 
`t_ds_task_definition` table, we can simplify the original three-table join 
into a single-table query, greatly improving read performance.
   
   This optimization applies to several core APIs, including:
   - `GET /{code}/view-tree`: Generate workflow tree structure
   - `POST /query-task-definition-list`: Query task definition list
   - Workflow release validation logic (`/{code}/release`)
   - Task instance state statistics (filtered by workflowCode)
   
   ---
   
   ### ๐Ÿ’ก Solution
   
   ##### 1. Database Schema Changes
   -- Add workflow_definition_code to task definition table
   ALTER TABLE `t_ds_task_definition` 
   ADD COLUMN `workflow_definition_code` BIGINT(20) NOT NULL DEFAULT 0 COMMENT 
'Workflow definition code' AFTER `project_code`;
   
   -- Add fields to log table
   ALTER TABLE `t_ds_task_definition_log` 
   ADD COLUMN `workflow_definition_code` BIGINT(20) NOT NULL DEFAULT 0 COMMENT 
'Workflow definition code' AFTER `project_code`,
   ADD COLUMN `workflow_definition_version` INT(11) NOT NULL DEFAULT 0 COMMENT 
'Workflow definition version' AFTER `workflow_definition_code`;
   
   ##### 2. Data Migration Scripts
   -- Populate task definition table from relation table
   UPDATE t_ds_task_definition td
   JOIN t_ds_workflow_task_relation wtr 
     ON td.code = wtr.post_task_code AND td.version = wtr.post_task_version
   SET td.workflow_definition_code = wtr.workflow_definition_code
   WHERE td.workflow_definition_code = 0;
   
   -- Populate log table
   UPDATE t_ds_task_definition_log tdl
   JOIN t_ds_workflow_task_relation_log wtrl 
     ON tdl.code = wtrl.post_task_code AND tdl.version = wtrl.post_task_version
   SET tdl.workflow_definition_code = wtrl.workflow_definition_code, 
       tdl.workflow_definition_version = wtrl.workflow_definition_version
   WHERE tdl.workflow_definition_code = 0;
   
   ##### 3. Code-Level Changes
    (1) Entity Updates
   // TaskDefinition.java
   private long workflowDefinitionCode;
   
   // TaskDefinitionLog.java
   private long workflowDefinitionCode;
   private int workflowDefinitionVersion;
   (2) DAO Query Optimization (Key Change)
   List<WorkflowTaskRelationLog> relations = 
relationMapper.queryByWorkflowCodeAndVersion(...);
   Set<TaskDefinition> codes = relations.stream().map(...).collect(...);
   return taskDefinitionLogMapper.queryByTaskDefinitions(codes);
   After (single-table query):
   List<TaskDefinition> list = 
taskDefinitionMapper.queryByWorkflowDefinitionCode(workflowCode);
   Corresponding SQL:
   <select id="queryByWorkflowDefinitionCode" resultType="TaskDefinition">
     SELECT <include refid="baseSql"/>
     FROM t_ds_task_definition
     WHERE workflow_definition_code = #{workflowDefinitionCode}
   </select>
   (3) Affected Core Features
   
   Module | API / Operation | Impacted
   -- | -- | --
   Workflow Save/Update | Create/Update Task Definitions | โœ… Need to populate 
field
   Release Validation | /releaseย check sub-workflows | โœ… Query performance 
improved
   View Rendering | /{code}/view-tree | โœ… Faster query
   Task Query | POST /query-task-definition-list | โœ… Accelerated
   Statistics API | /v2/statistics/tasks/states/count | โœ… Optimizable
   Python Gateway | queryByName | โœ… Optimizable
   
   </body>
   </html>
   
   ##### 4. Business Logic Adaptation
   The following operations must ensure workflow_definition_code is set when 
saving or updating task definitions:
   
   Workflow creation & update
   Import/export
   Version rollback
   Copy functionality
   
   ๐Ÿ“ˆ Expected Impact
   Significantly reduce database query load
   Improve frontend page loading speed (especially for large workflows)
   Enhance user experience and support larger-scale scheduling scenarios
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to