wcmolin commented on issue #13283:
URL: 
https://github.com/apache/dolphinscheduler/issues/13283#issuecomment-1384990673

   Sorry, I can't reproduce it. When to restart the worker server? 10 seconds 
or so at the start of the task? Or more than a minute. If possible, can you 
chat with me on WeChat: `wcmolin`
   
   here is the master's log:
   ```
   [INFO] 2023-01-17 15:59:28.968 
org.apache.dolphinscheduler.server.master.processor.CacheProcessor:[52] - 
received command : CacheExpireCommand{CacheType=PROCESS_DEFINITION, 
cacheKey=8256494422080}
   [INFO] 2023-01-17 15:59:28.994 
org.apache.dolphinscheduler.server.master.processor.CacheProcessor:[70] - cache 
evict, type:processDefinition, key:8256494422080
   [INFO] 2023-01-17 15:59:29.451 
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[255] - 
find command 8, slot:0 :
   [INFO] 2023-01-17 15:59:29.453 
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[208] - 
find one command: id: 8, type: START_PROCESS
   [INFO] 2023-01-17 15:59:29.524 
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[221] - 
handle command end, command 8 process 85 start...
   [INFO] 2023-01-17 15:59:29.540 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1372] - 
add task to stand by list, task name:test_exit, task id:0, task 
code:8256472748480
   [INFO] 2023-01-17 15:59:29.557 
org.apache.dolphinscheduler.service.process.ProcessService:[1100] - start 
submit task : test_exit, instance id:85, state: RUNNING_EXECUTION
   [INFO] 2023-01-17 15:59:29.597 
org.apache.dolphinscheduler.service.process.ProcessService:[1113] - end submit 
task to db successfully:108 test_exit state:SUBMITTED_SUCCESS complete, 
instance id:85 state: RUNNING_EXECUTION  
   [INFO] 2023-01-17 15:59:29.611 
org.apache.dolphinscheduler.server.master.runner.task.CommonTaskProcessor:[164] 
- master submit success, task id:108, task name:test_exit, process id:85
   [INFO] 2023-01-17 15:59:29.613 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1386] - 
remove task from stand by list, id: 108 name:test_exit
   [INFO] 2023-01-17 15:59:30.994 
org.apache.dolphinscheduler.server.master.processor.TaskAckProcessor:[71] - 
taskAckCommand : TaskExecuteAckCommand{taskInstanceId=108, startTime=Tue Jan 17 
15:59:30 CST 2023, host='10.66.76.129:1234', status=1, 
logPath='/Users/molin/dev/github/dolphinscheduler/logs/8256494422080_3/85/108.log',
 
executePath='/tmp/dolphinscheduler/exec/process/7960944102080/8256494422080_3/85/108',
 processInstanceId='85'}
   [INFO] 2023-01-17 15:59:31.957 
org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[223]
 - already exists handler process size:0
   [INFO] 2023-01-17 15:59:32.064 
org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[230]
 - handle events 85 succeeded.
   [INFO] 2023-01-17 15:59:43.249 
org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[68] - 
failover execute started
   [INFO] 2023-01-17 15:59:43.262 
org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[74] - 
need failover hosts:[10.66.76.129:5678]
   [INFO] 2023-01-17 15:59:43.317 
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[430] - 
start master[10.66.76.129:5678] failover, process list size:1
   [INFO] 2023-01-17 15:59:43.330 
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[462] - 
master[10.66.76.129:5678] failover end, useTime:37ms
   [INFO] 2023-01-17 15:59:49.665 
org.apache.dolphinscheduler.server.master.processor.TaskResponseProcessor:[72] 
- received command : TaskExecuteResponseCommand{taskInstanceId=108, 
processInstanceId=85, status=6, startTime=null, endTime=Tue Jan 17 15:59:49 CST 
2023, host=null, logPath=null, executePath=null, processId=16894, appIds='', 
varPool=[]}
   [INFO] 2023-01-17 15:59:50.004 
org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[223]
 - already exists handler process size:0
   [INFO] 2023-01-17 15:59:50.061 
org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[230]
 - handle events 85 succeeded.
   [INFO] 2023-01-17 15:59:50.983 
org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[127] - 
handle process instance : 85 , events count:1
   [INFO] 2023-01-17 15:59:50.984 
org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[130] - 
already exists handler process size:0
   [INFO] 2023-01-17 15:59:50.986 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[314] - 
process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: 
FAILURE task instance id: 108 process instance id: 85 context: null
   [INFO] 2023-01-17 15:59:50.992 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[428] - 
work flow 85 task 108 state:FAILURE 
   [INFO] 2023-01-17 15:59:51.031 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1299] - 
work flow process instance [id: 85, name:test_retry-3-20230117155929472], state 
change from RUNNING_EXECUTION to FAILURE, cmd type: START_PROCESS
   [INFO] 2023-01-17 15:59:51.048 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[501] - 
process:85 state FAILURE change to FAILURE
   [INFO] 2023-01-17 15:59:51.150 
org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[139] - 
process instance 85 finished.
   [INFO] 2023-01-17 15:59:52.013 
org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[219]
 - remove process instance: 85
   [INFO] 2023-01-17 15:59:52.622 
org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[239] - 
worker group node : /nodes/worker/default/10.66.76.129:1234 down.
   [INFO] 2023-01-17 15:59:52.622 
org.apache.dolphinscheduler.server.master.registry.MasterRegistryDataListener:[80]
 - worker node deleted : /nodes/worker/default/10.66.76.129:1234
   [INFO] 2023-01-17 15:59:52.633 
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[196] - 
WORKER node deleted : /nodes/worker/default/10.66.76.129:1234
   [INFO] 2023-01-17 15:59:52.635 
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[206] - 
path: /nodes/worker/default/10.66.76.129:1234 not exists
   [INFO] 2023-01-17 15:59:52.644 
org.apache.dolphinscheduler.service.registry.RegistryClient:[164] - WORKER 
server dead , and /nodes/worker/default/10.66.76.129:1234 added to zk dead 
server path success
   [INFO] 2023-01-17 15:59:52.694 
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[380] - 
start worker[10.66.76.129:1234] failover, task list size:0
   [INFO] 2023-01-17 15:59:52.695 
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[406] - 
end worker[10.66.76.129:1234] failover, useTime:40ms
   [INFO] 2023-01-17 16:00:43.334 
org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[68] - 
failover execute started
   ```
   here is worker log:
   ```
   [INFO] 2023-01-17 15:59:30.624 
org.apache.dolphinscheduler.server.worker.processor.TaskExecuteProcessor:[108] 
- received command : 
TaskExecuteRequestCommand{taskExecutionContext='{"taskInstanceId":108,"taskName":"test_exit","firstSubmitTime":"2023-01-17
 
15:59:29","startTime":null,"endTime":null,"taskType":"SHELL","host":null,"executePath":null,"logPath":null,"taskJson":null,"processId":0,"processDefineCode":8256494422080,"processDefineVersion":3,"appIds":null,"processInstanceId":85,"scheduleTime":null,"globalParams":null,"executorId":1,"cmdTypeIfComplement":0,"tenantCode":"molin","queue":"default","projectCode":7960944102080,"taskParams":"{\"resourceList\":[],\"localParams\":[],\"rawScript\":\"sleep
 
120\",\"dependence\":{},\"conditionResult\":{\"successNode\":[],\"failedNode\":[]},\"waitStartTimeout\":{},\"switchResult\":{}}","envFile":null,"environmentConfig":null,"definedParams":null,"taskAppId":null,"taskTimeoutStrategy":null,"taskTimeout":2147483647,"workerGroup":"default","de
 
layTime":0,"currentExecutionStatus":null,"resources":{},"sqlTaskExecutionContext":{"warningGroupId":0,"connectionParams":null,"udfFuncTenantCodeMap":null,"defaultFS":null},"dataxTaskExecutionContext":{"dataSourceId":0,"sourcetype":0,"sourceConnectionParams":null,"dataTargetId":0,"targetType":0,"targetConnectionParams":null},"dependenceTaskExecutionContext":null,"sqoopTaskExecutionContext":{"dataSourceId":0,"sourcetype":0,"sourceConnectionParams":null,"dataTargetId":0,"targetType":0,"targetConnectionParams":null},"varPool":null,"dryRun":0,"paramsMap":null,"procedureTaskExecutionContext":{"connectionParams":null}}'}
   [INFO] 2023-01-17 15:59:30.881 
org.apache.dolphinscheduler.common.utils.FileUtils:[121] - create dir success 
/tmp/dolphinscheduler/exec/process/7960944102080/8256494422080_3/85/108
   [INFO] 2023-01-17 15:59:30.881 
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[129] - task 
instance local execute path : 
/tmp/dolphinscheduler/exec/process/7960944102080/8256494422080_3/85/108
   [INFO] 2023-01-17 15:59:30.978 
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[146] - the 
task begins to execute. task instance id: 108
   [INFO] 2023-01-17 15:59:31.029 TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[83] - shell task 
params {"resourceList":[],"localParams":[],"rawScript":"sleep 
120","dependence":{},"conditionResult":{"successNode":[],"failedNode":[]},"waitStartTimeout":{},"switchResult":{}}
   [INFO] 2023-01-17 15:59:31.117 TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[137] - raw script : 
sleep 120
   [INFO] 2023-01-17 15:59:31.117 TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[138] - task execute 
path : /tmp/dolphinscheduler/exec/process/7960944102080/8256494422080_3/85/108
   [INFO] 2023-01-17 15:59:31.125 TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[86] - tenantCode 
user:molin, task dir:85_108
   [INFO] 2023-01-17 15:59:31.126 TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[91] - create command 
file:/tmp/dolphinscheduler/exec/process/7960944102080/8256494422080_3/85/108/85_108.command
   [INFO] 2023-01-17 15:59:31.126 TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[117] - command : 
#!/bin/sh
   BASEDIR=$(cd `dirname $0`; pwd)
   cd $BASEDIR
   source /etc/profile
   
/tmp/dolphinscheduler/exec/process/7960944102080/8256494422080_3/85/108/85_108_node.sh
   [INFO] 2023-01-17 15:59:31.143 TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[287] - task run 
command: sudo -u molin sh 
/tmp/dolphinscheduler/exec/process/7960944102080/8256494422080_3/85/108/85_108.command
   [INFO] 2023-01-17 15:59:31.146 TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[178] - process start, 
process id is: 16894
   [INFO] 2023-01-17 15:59:32.146 TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[66] -  -> welcome to 
use bigdata scheduling system...
   Disconnected from the target VM, address: '127.0.0.1:56983', transport: 
'socket'
   [INFO] 2023-01-17 15:59:49.578 TaskLogLogger-class 
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[202] - process has 
exited, execute 
path:/tmp/dolphinscheduler/exec/process/7960944102080/8256494422080_3/85/108, 
processId:16894 ,exitStatusCode:130 ,processWaitForStatus:true 
,processExitValue:130
   [INFO] 2023-01-17 15:59:49.600 
org.apache.dolphinscheduler.server.worker.WorkerServer:[193] - worker server is 
stopping ..., cause : shutdownHook
   [INFO] 2023-01-17 15:59:49.657 
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[213] - task 
instance id : 108,task final status : FAILURE
   [INFO] 2023-01-17 15:59:49.664 
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[276] - 
develop mode is: false
   [INFO] 2023-01-17 15:59:49.673 
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[294] - exec 
local path: 
/tmp/dolphinscheduler/exec/process/7960944102080/8256494422080_3/85/108 cleared.
   [INFO] 2023-01-17 15:59:49.775 org.quartz.core.QuartzScheduler:[585] - 
Scheduler quartzScheduler_$_NON_CLUSTERED paused.
   [INFO] 2023-01-17 15:59:49.785 
org.springframework.scheduling.quartz.SchedulerFactoryBean:[845] - Shutting 
down Quartz Scheduler
   [INFO] 2023-01-17 15:59:49.790 org.quartz.core.QuartzScheduler:[666] - 
Scheduler quartzScheduler_$_NON_CLUSTERED shutting down.
   [INFO] 2023-01-17 15:59:49.791 org.quartz.core.QuartzScheduler:[585] - 
Scheduler quartzScheduler_$_NON_CLUSTERED paused.
   [INFO] 2023-01-17 15:59:49.793 org.quartz.core.QuartzScheduler:[740] - 
Scheduler quartzScheduler_$_NON_CLUSTERED shutdown complete.
   [INFO] 2023-01-17 15:59:49.812 com.zaxxer.hikari.HikariDataSource:[350] - 
DolphinScheduler - Shutdown initiated...
   [INFO] 2023-01-17 15:59:49.864 com.zaxxer.hikari.HikariDataSource:[352] - 
DolphinScheduler - Shutdown completed.
   
   Process finished with exit code 0
   
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to