danielfree commented on issue #11227: URL: https://github.com/apache/dolphinscheduler/issues/11227#issuecomment-1241413973
@SbloodyS I've seen similar issues in 2.0.6, from the log it will show PROCESS_TIMEOUT. I suspect it might be related with un-necessary failover info appeared in the log. instance id 153: [INFO] 2022-09-08 23:30:00.543 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[208] - find one command: id: 154, type: SCHEDULER [INFO] 2022-09-08 23:30:00.546 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[221] - handle command end, command 154 process 153 start... [INFO] 2022-09-08 23:30:00.551 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1360] - add task to stand by list, task name:ssh-check, task id:0, task code:6565890718688 [INFO] 2022-09-08 23:30:00.553 org.apache.dolphinscheduler.service.process.ProcessService:[1088] - start submit task : ssh-check, instance id:153, state: RUNNING_EXECUTION .... ... [INFO] 2022-09-08 23:31:47.712 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[308] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: SUCCESS task instance id: 507 process instance id: 153 context: null [INFO] 2022-09-08 23:31:47.713 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[422] - work flow 153 task 507 state:SUCCESS **[INFO] 2022-09-08 23:40:27.862 org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[68] - failover execute started** [INFO] 2022-09-08 23:40:27.865 org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[74] - need failover hosts:[dolphin-master-1.dolphin-master-headless:5678] [INFO] 2022-09-08 23:40:27.869 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[424] - start master[dolphin-master-1.dolphin-master-headless:5678] failover, process list size:2 [INFO] 2022-09-08 23:40:27.871 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[456] - master[dolphin-master-1.dolphin-master-headless:5678] failover end, useTime:4ms [INFO] 2022-09-08 23:45:04.780 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[127] - handle process instance : 153 , events count:1 [INFO] 2022-09-08 23:45:04.780 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[130] - already exists handler process size:0 **[INFO] 2022-09-08 23:45:04.780 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[308] - process event: State Event : key : null type: PROCESS_TIMEOUT executeStatus: null task instance id: 0 process instance id: 153 context: null** [INFO] 2022-09-08 23:50:27.872 org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[68] - failover execute started [INFO] 2022-09-08 23:50:27.875 org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[74] - need failover hosts:[dolphin-master-1.dolphin-master-headless:5678] [INFO] 2022-09-08 23:50:27.879 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[424] - start master[dolphin-master-1.dolphin-master-headless:5678] failover, process list size:2 [INFO] 2022-09-08 23:50:27.881 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[456] - master[dolphin-master-1.dolphin-master-headless:5678] failover end, useTime:4ms I'm deploying ds in k8s with two masters, my question is why this 'need failover' happens a lot? maybe related with some configs in zookeeper? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
