bullyb1911 commented on issue #13636: URL: https://github.com/apache/dolphinscheduler/issues/13636#issuecomment-1710219178
Experienced this issue in Psuedo Cluster Configuration. When all servers were stopped and then restarted minutes later using the packaged start/stop scripts, the server reboots and creates scheduled workflow instances even though pending jobs during the time the server was offline are no longer executing due to a Recover Serial wait state. The scheduler continues to schedule new executions while the pending job remains in a Recover Serial wait state. When the Recovery Serial Wait jobs eventually timeout, they terminate. There are roughly 400 scheduled jobs and each job has timeouts set to default. Cron Manage is offline to prevent new scheduled executions. Users have to wait for the jobs to timeout before being able to run the workflow. I copied the workflow config and created a temporary project to get the required tasks to run. You can change the workflow config to parallel execution if this workflow can support such a setting to avoid this. This is Version 3.1.7. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
