github-actions[bot] commented on issue #10449: URL: https://github.com/apache/dolphinscheduler/issues/10449#issuecomment-1155958748
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues. ### What happened Using dolphinscheduler to schedule 100,000-level workflows per day, when using mysql as the metadata database for millions of tasks, the pressure on mysql is too high As shown in the figure below, MySQL is initially configured with 2 cores and 4G, and the CPU usage is very high. Then upgrade mysql to 8 cores and 16g, and soon the cpu usage will reach 100%  ### What you expected to happen After troubleshooting, it was found that the data volume of the three tables t_ds_task_instance, t_ds_process_instance, and t_ds_relation_process_instance was too large, and no entry was configured to delete the data. The reason for the high usage of mysql cpu is that a large number of the same statements query the t_ds_relation_process_instance table. The query statements are as follows: select id, parent_process_instance_id, parent_task_instance_id, process_instance_id from t_ds_relation_process_instance where parent_process_instance_id = 667735 and parent_task_instance_id = 2454593; This table does not have an appropriate index created, causing each query to scan the full table. Therefore, you can add an index to this table, which can greatly relieve the pressure on MySQL.  ### How to reproduce Workflow and task parallelism are the default values, and the metadata database is mysql Schedule hundreds of workflows per hour, each with dozens of tasks Executing for a period of time will fill up the mysql cpu ### Anything else _No response_ ### Version 2.0.5 ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
