JinyLeeChina commented on issue #11227:
URL: 
https://github.com/apache/dolphinscheduler/issues/11227#issuecomment-1259031353

   > > I think this has nothing to do with the master failover. Can you provide 
the logs of the instance id 153 of the worker
   > 
   > Here's another case with my running installation:
   > 
   > instance id: 211 task instance id: 672 logs from worker:
   > 
   > ```
   > [INFO] 2022-09-16 07:50:01.165 
org.apache.dolphinscheduler.server.worker.processor.TaskExecuteProcessor:[110] 
- received command : 
TaskExecuteRequestCommand{taskExecutionContext='{"taskInstanceId":672,"taskName":"flightradar_top_routes","firstSubmitTime":"2022-09-16
 
07:50:01","startTime":null,"endTime":null,"taskType":"SHELL","host":null,"executePath":null,"logPath":null,"taskJson":null,"processId":0,"processDefineCode":6629036562401,"processDefineVersion":3,"appIds":null,"processInstanceId":211,"scheduleTime":"2022-09-16
 
07:50:00","globalParams":null,"executorId":2,"cmdTypeIfComplement":6,"tenantCode":"aspex","queue":"default","projectCode":6565237764192,"taskParams":"***","envFile":null,"environmentConfig":null,"definedParams":null,"taskAppId":null,"taskTimeoutStrategy":null,"taskTimeout":2147483647,"workerGroup":"default","delayTime":0,"currentExecutionStatus":null,"sqlTaskExecutionContext":{"warningGroupId":0,"connectionParams":null,"udfFuncTenantCodeMap":null,"defaultFS":nu
 
ll},"dataxTaskExecutionContext":{"dataSourceId":0,"sourcetype":0,"sourceConnectionParams":null,"dataTargetId":0,"targetType":0,"targetConnectionParams":null},"dependenceTaskExecutionContext":null,"sqoopTaskExecutionContext":{"dataSourceId":0,"sourcetype":0,"sourceConnectionParams":null,"dataTargetId":0,"targetType":0,"targetConnectionParams":null},"varPool":null,"dryRun":0,"paramsMap":null,"procedureTaskExecutionContext":{"connectionParams":null}}'}
   > [INFO] 2022-09-16 07:50:01.167 
org.apache.dolphinscheduler.common.utils.FileUtils:[121] - create dir success 
/tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672
   > [INFO] 2022-09-16 07:50:01.167 
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[128] - task 
instance local execute path : 
/tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672
   > [INFO] 2022-09-16 07:50:01.167 
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[146] - the 
task begins to execute. task instance id: 672
   > [INFO] 2022-09-16 07:50:01.265 TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[60] - FINALIZE_SESSION
   > ...
   > [INFO] 2022-09-16 07:50:01.287 TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[138] - task execute 
path : /tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672
   > [INFO] 2022-09-16 07:50:01.291 TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[178] - process start, 
process id is: 24435
   > 07:50:42.439 [pool-2-thread-4] INFO 
org.apache.dolphinscheduler.server.log.LoggerRequestProcessor - received 
command : Command [type=VIEW_WHOLE_LOG_REQUEST, opaque=695701, 
genCommandTimeMillis=0, bodyLen=82]
   > [INFO] 2022-09-16 07:53:58.443 TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[202] - process has 
exited, execute 
path:/tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672, 
processId:24435 ,exitStatusCode:0 ,processWaitForStatus:true ,processExitValue:0
   > [INFO] 2022-09-16 07:53:58.444 
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[206] - task 
instance id : 672,task final status : SUCCESS
   > [INFO] 2022-09-16 07:53:58.444 
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[264] - 
develop mode is: false
   > [INFO] 2022-09-16 07:53:58.445 
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[282] - exec 
local path: 
/tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672 
cleared.
   > [INFO] 2022-09-16 07:53:59.196 
org.apache.dolphinscheduler.common.utils.FileUtils:[121] - create dir success 
/tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672
   > [INFO] 2022-09-16 07:53:59.196 
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[128] - task 
instance local execute path : 
/tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672
   > [INFO] 2022-09-16 07:53:59.196 
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[146] - the 
task begins to execute. task instance id: 672
   > ```
   > 
   > Note that at **[INFO] 2022-09-16 07:53:59.196** the task instance 672 is 
executed again after a SUCCESS run
   > 
   > master logs:
   > 
   > ```
   > [INFO] 2022-09-16 07:50:00.012 
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[255] - 
find command 212, slot:0 :
   > [INFO] 2022-09-16 07:50:00.013 
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[208] - 
find one command: id: 212, type: SCHEDULER
   > [INFO] 2022-09-16 07:50:00.016 
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[221] - 
handle command end, command 212 process 211 start...
   > [INFO] 2022-09-16 07:50:00.025 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1360] - 
add task to stand by list, task name:ssh-check, task id:0, task 
code:6629036562272
   > [INFO] 2022-09-16 07:50:00.027 
org.apache.dolphinscheduler.service.process.ProcessService:[1088] - start 
submit task : ssh-check, instance id:211, state: RUNNING_EXECUTION
   > [INFO] 2022-09-16 07:50:00.031 
org.apache.dolphinscheduler.service.process.ProcessService:[1101] - end submit 
task to db successfully:671 ssh-check state:SUBMITTED_SUCCESS complete, 
instance id:211 state: RUNNING_EXECUTION  
   > [INFO] 2022-09-16 07:50:00.037 
org.apache.dolphinscheduler.server.master.runner.task.CommonTaskProcessor:[164] 
- master submit success, task id:671, task name:ssh-check, process id:211
   > [INFO] 2022-09-16 07:50:00.039 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1374] - 
remove task from stand by list, id: 671 name:ssh-check
   > [INFO] 2022-09-16 07:50:00.042 
org.apache.dolphinscheduler.server.master.processor.TaskAckProcessor:[71] - 
taskAckCommand : TaskExecuteAckCommand{taskInstanceId=671, startTime=Fri Sep 16 
07:50:00 CST 2022, host='dolphin-worker-1.dolphin-worker-headless:1234', 
status=1, 
logPath='/opt/apache-dolphinscheduler-2.0.6-bin/logs/6629036562401_3/211/671.log',
 
executePath='/tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/671',
 processInstanceId='0'}
   > [INFO] 2022-09-16 07:50:00.042 
org.apache.dolphinscheduler.server.master.processor.TaskAckProcessor:[71] - 
taskAckCommand : TaskExecuteAckCommand{taskInstanceId=671, startTime=Fri Sep 16 
07:50:00 CST 2022, host='dolphin-worker-1.dolphin-worker-headless:1234', 
status=1, 
logPath='/opt/apache-dolphinscheduler-2.0.6-bin/logs/6629036562401_3/211/671.log',
 
executePath='/tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/671',
 processInstanceId='211'}
   > [INFO] 2022-09-16 07:50:00.964 
org.apache.dolphinscheduler.server.master.processor.TaskResponseProcessor:[72] 
- received command : TaskExecuteResponseCommand{taskInstanceId=671, 
processInstanceId=211, status=7, startTime=null, endTime=Fri Sep 16 07:50:00 
CST 2022, host=null, logPath=null, executePath=null, processId=24426, 
appIds='', varPool=[]}
   > [INFO] 2022-09-16 07:50:01.022 
org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[223]
 - already exists handler process size:0
   > [INFO] 2022-09-16 07:50:01.026 
org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[230]
 - persist events 211 succeeded.
   > [INFO] 2022-09-16 07:50:01.144 
org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[127] - 
handle process instance : 211 , events count:1
   > [INFO] 2022-09-16 07:50:01.145 
org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[130] - 
already exists handler process size:0
   > [INFO] 2022-09-16 07:50:01.145 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[308] - 
process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: 
SUCCESS task instance id: 671 process instance id: 211 context: null
   > [INFO] 2022-09-16 07:50:01.146 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[422] - 
work flow 211 task 671 state:SUCCESS 
   > [INFO] 2022-09-16 07:50:01.150 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1360] - 
add task to stand by list, task name:flightradar_top_routes, task id:0, task 
code:6629036562400
   > [INFO] 2022-09-16 07:50:01.150 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1033] - 
taskCode: 6629036562400 completeDependTaskList: [6629036562272]
   > [INFO] 2022-09-16 07:50:01.151 
org.apache.dolphinscheduler.service.process.ProcessService:[1088] - start 
submit task : flightradar_top_routes, instance id:211, state: RUNNING_EXECUTION
   > [INFO] 2022-09-16 07:50:01.153 
org.apache.dolphinscheduler.service.process.ProcessService:[1101] - end submit 
task to db successfully:672 flightradar_top_routes state:SUBMITTED_SUCCESS 
complete, instance id:211 state: RUNNING_EXECUTION  
   > [INFO] 2022-09-16 07:50:01.159 
org.apache.dolphinscheduler.server.master.runner.task.CommonTaskProcessor:[164] 
- master submit success, task id:672, task name:flightradar_top_routes, process 
id:211
   > [INFO] 2022-09-16 07:50:01.161 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1374] - 
remove task from stand by list, id: 672 name:flightradar_top_routes
   > [INFO] 2022-09-16 07:50:01.168 
org.apache.dolphinscheduler.server.master.processor.TaskAckProcessor:[71] - 
taskAckCommand : TaskExecuteAckCommand{taskInstanceId=672, startTime=Fri Sep 16 
07:50:01 CST 2022, host='dolphin-worker-1.dolphin-worker-headless:1234', 
status=1, 
logPath='/opt/apache-dolphinscheduler-2.0.6-bin/logs/6629036562401_3/211/672.log',
 
executePath='/tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672',
 processInstanceId='0'}
   > [INFO] 2022-09-16 07:50:01.168 
org.apache.dolphinscheduler.server.master.processor.TaskAckProcessor:[71] - 
taskAckCommand : TaskExecuteAckCommand{taskInstanceId=672, startTime=Fri Sep 16 
07:50:01 CST 2022, host='dolphin-worker-1.dolphin-worker-headless:1234', 
status=1, 
logPath='/opt/apache-dolphinscheduler-2.0.6-bin/logs/6629036562401_3/211/672.log',
 
executePath='/tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672',
 processInstanceId='211'}
   > [INFO] 2022-09-16 07:50:02.022 
org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[223]
 - already exists handler process size:0
   > [INFO] 2022-09-16 07:50:02.025 
org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[230]
 - persist events 211 succeeded.
   > [INFO] 2022-09-16 07:50:41.424 
org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[68] - 
failover execute started
   > [INFO] 2022-09-16 07:50:41.426 
org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[74] - 
need failover hosts:[dolphin-master-1.dolphin-master-headless:5678]
   > [INFO] 2022-09-16 07:50:41.429 
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[424] - 
start master[dolphin-master-1.dolphin-master-headless:5678] failover, process 
list size:2
   > [INFO] 2022-09-16 07:50:41.431 
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[442] - 
failover task instance id: 672, process instance id: 211
   > [INFO] 2022-09-16 07:50:41.424 
org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[68] - 
failover execute started
   > [INFO] 2022-09-16 07:50:41.426 
org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[74] - 
need failover hosts:[dolphin-master-1.dolphin-master-headless:5678]
   > [INFO] 2022-09-16 07:50:41.429 
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[424] - 
start master[dolphin-master-1.dolphin-master-headless:5678] failover, process 
list size:2
   > **[INFO] 2022-09-16 07:50:41.431** 
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[442] - 
failover task instance id: 672, process instance id: 211
   > [INFO] 2022-09-16 07:50:42.432 
org.apache.dolphinscheduler.service.log.LogClientService:[117] - view log path 
/opt/apache-dolphinscheduler-2.0.6-bin/logs/6629036562401_3/211/672.log
   > [INFO] 2022-09-16 07:50:42.441 
org.apache.dolphinscheduler.remote.NettyRemotingClient:[390] - netty client 
closed
   > [INFO] 2022-09-16 07:50:42.445 
org.apache.dolphinscheduler.service.log.LogClientService:[74] - logger client 
closed
   > [ERROR] 2022-09-16 07:50:42.445 
org.apache.dolphinscheduler.server.utils.ProcessUtils:[211] - kill yarn job 
failure
   > java.lang.NullPointerException: null
   >    at 
org.apache.dolphinscheduler.server.utils.ProcessUtils.killYarnJob(ProcessUtils.java:197)
   >    at 
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverTaskInstance(MasterRegistryClient.java:490)
   >    at 
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverMaster(MasterRegistryClient.java:443)
   >    at 
org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread.run(FailoverExecuteThread.java:80)
   > [INFO] 2022-09-16 07:50:42.447 
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[456] - 
master[dolphin-master-1.dolphin-master-headless:5678] failover end, 
useTime:1019ms
   > [INFO] 2022-09-16 07:50:43.147 
org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[127] - 
handle process instance : 211 , events count:1
   > [INFO] 2022-09-16 07:50:43.148 
org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[130] - 
already exists handler process size:0
   > [INFO] 2022-09-16 07:50:43.148 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[308] - 
process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: 
NEED_FAULT_TOLERANCE task instance id: 672 process instance id: 211 context: 
null
   > [INFO] 2022-09-16 07:50:43.149 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[422] - 
work flow 211 task 672 state:NEED_FAULT_TOLERANCE 
   > [INFO] 2022-09-16 07:50:43.150 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1360] - 
add task to stand by list, task name:flightradar_top_routes, task id:672, task 
code:6629036562400
   > **[INFO] 2022-09-16 07:53:58.445** 
org.apache.dolphinscheduler.server.master.processor.TaskResponseProcessor:[72] 
- received command : TaskExecuteResponseCommand{taskInstanceId=672, 
processInstanceId=211, status=7, startTime=null, endTime=Fri Sep 16 07:53:58 
CST 2022, host=null, logPath=null, executePath=null, processId=24435, 
appIds='', varPool=[]}
   > **[INFO] 2022-09-16 07:53:59.041** 
org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[223]
 - already exists handler process size:0
   > [INFO] 2022-09-16 07:53:59.044 
org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[230]
 - persist events 211 succeeded.
   > [INFO] 2022-09-16 07:53:59.168 
org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[127] - 
handle process instance : 211 , events count:1
   > [INFO] 2022-09-16 07:53:59.168 
org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[130] - 
already exists handler process size:0
   > **[INFO] 2022-09-16 07:53:59.168** 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[308] - 
process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: 
SUCCESS task instance id: 672 process instance id: 211 context: null
   > [INFO] 2022-09-16 07:53:59.169 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[422] - 
work flow 211 task 672 state:SUCCESS 
   > [INFO] 2022-09-16 07:53:59.172 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1033] - 
taskCode: 6629036562400 completeDependTaskList: [6629036562272, 6629036562400]
   > [INFO] 2022-09-16 07:53:59.173 
org.apache.dolphinscheduler.service.process.ProcessService:[1088] - start 
submit task : flightradar_top_routes, instance id:211, state: RUNNING_EXECUTION
   > [INFO] 2022-09-16 07:53:59.175 
org.apache.dolphinscheduler.service.process.ProcessService:[1101] - end submit 
task to db successfully:672 flightradar_top_routes state:SUBMITTED_SUCCESS 
complete, instance id:211 state: RUNNING_EXECUTION
   > ```
   > 
   > Note at **[INFO] 2022-09-16 07:50:41.431** the master somehow starts to 
failover with running task 672 in worker, and in **[INFO] 2022-09-16 
07:53:59.168** the task status is SUCCESS but the master submits the task again 
in **[INFO] 2022-09-16 07:53:59.173**
   
   The log taskInstance id 672 does not end


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to