danielfree commented on issue #11227:
URL: 
https://github.com/apache/dolphinscheduler/issues/11227#issuecomment-1241413973

   @SbloodyS  I've seen similar issues in 2.0.6,  from the log it will show 
PROCESS_TIMEOUT. I suspect it might be related with un-necessary failover info 
appeared in the log.
   
   instance id 153:
   
   [INFO] 2022-09-08 23:30:00.543 
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[208] - 
find one command: id: 154, type: SCHEDULER
   [INFO] 2022-09-08 23:30:00.546 
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[221] - 
handle command end, command 154 process 153 start...
   [INFO] 2022-09-08 23:30:00.551 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1360] - 
add task to stand by list, task name:ssh-check, task id:0, task 
code:6565890718688
   [INFO] 2022-09-08 23:30:00.553 
org.apache.dolphinscheduler.service.process.ProcessService:[1088] - start 
submit task : ssh-check, instance id:153, state: RUNNING_EXECUTION
   ....
   ...
   [INFO] 2022-09-08 23:31:47.712 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[308] - 
process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: 
SUCCESS task instance id: 507 process instance id: 153 context: null
   [INFO] 2022-09-08 23:31:47.713 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[422] - 
work flow 153 task 507 state:SUCCESS
   **[INFO] 2022-09-08 23:40:27.862 
org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[68] - 
failover execute started**
   [INFO] 2022-09-08 23:40:27.865 
org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[74] - 
need failover hosts:[dolphin-master-1.dolphin-master-headless:5678]
   [INFO] 2022-09-08 23:40:27.869 
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[424] - 
start master[dolphin-master-1.dolphin-master-headless:5678] failover, process 
list size:2
   [INFO] 2022-09-08 23:40:27.871 
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[456] - 
master[dolphin-master-1.dolphin-master-headless:5678] failover end, useTime:4ms
   [INFO] 2022-09-08 23:45:04.780 
org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[127] - 
handle process instance : 153 , events count:1
   [INFO] 2022-09-08 23:45:04.780 
org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[130] - 
already exists handler process size:0
   **[INFO] 2022-09-08 23:45:04.780 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[308] - 
process event: State Event : key : null type: PROCESS_TIMEOUT executeStatus: 
null task instance id: 0 process instance id: 153 context: null**
   [INFO] 2022-09-08 23:50:27.872 
org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[68] - 
failover execute started
   [INFO] 2022-09-08 23:50:27.875 
org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[74] - 
need failover hosts:[dolphin-master-1.dolphin-master-headless:5678]
   [INFO] 2022-09-08 23:50:27.879 
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[424] - 
start master[dolphin-master-1.dolphin-master-headless:5678] failover, process 
list size:2
   [INFO] 2022-09-08 23:50:27.881 
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[456] - 
master[dolphin-master-1.dolphin-master-headless:5678] failover end, useTime:4ms
   
   I'm deploying ds in k8s with two masters, my question is why this 'need 
failover' happens a lot? maybe related with some configs in zookeeper?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to