wcmolin commented on issue #13283:
URL:
https://github.com/apache/dolphinscheduler/issues/13283#issuecomment-1384990673
Sorry, I can't reproduce it. When to restart the worker server? 10 seconds
or so at the start of the task? Or more than a minute. If possible, can you
chat with me on WeChat: `wcmolin`
here is the master's log:
```
[INFO] 2023-01-17 15:59:28.968
org.apache.dolphinscheduler.server.master.processor.CacheProcessor:[52] -
received command : CacheExpireCommand{CacheType=PROCESS_DEFINITION,
cacheKey=8256494422080}
[INFO] 2023-01-17 15:59:28.994
org.apache.dolphinscheduler.server.master.processor.CacheProcessor:[70] - cache
evict, type:processDefinition, key:8256494422080
[INFO] 2023-01-17 15:59:29.451
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[255] -
find command 8, slot:0 :
[INFO] 2023-01-17 15:59:29.453
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[208] -
find one command: id: 8, type: START_PROCESS
[INFO] 2023-01-17 15:59:29.524
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[221] -
handle command end, command 8 process 85 start...
[INFO] 2023-01-17 15:59:29.540
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1372] -
add task to stand by list, task name:test_exit, task id:0, task
code:8256472748480
[INFO] 2023-01-17 15:59:29.557
org.apache.dolphinscheduler.service.process.ProcessService:[1100] - start
submit task : test_exit, instance id:85, state: RUNNING_EXECUTION
[INFO] 2023-01-17 15:59:29.597
org.apache.dolphinscheduler.service.process.ProcessService:[1113] - end submit
task to db successfully:108 test_exit state:SUBMITTED_SUCCESS complete,
instance id:85 state: RUNNING_EXECUTION
[INFO] 2023-01-17 15:59:29.611
org.apache.dolphinscheduler.server.master.runner.task.CommonTaskProcessor:[164]
- master submit success, task id:108, task name:test_exit, process id:85
[INFO] 2023-01-17 15:59:29.613
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1386] -
remove task from stand by list, id: 108 name:test_exit
[INFO] 2023-01-17 15:59:30.994
org.apache.dolphinscheduler.server.master.processor.TaskAckProcessor:[71] -
taskAckCommand : TaskExecuteAckCommand{taskInstanceId=108, startTime=Tue Jan 17
15:59:30 CST 2023, host='10.66.76.129:1234', status=1,
logPath='/Users/molin/dev/github/dolphinscheduler/logs/8256494422080_3/85/108.log',
executePath='/tmp/dolphinscheduler/exec/process/7960944102080/8256494422080_3/85/108',
processInstanceId='85'}
[INFO] 2023-01-17 15:59:31.957
org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[223]
- already exists handler process size:0
[INFO] 2023-01-17 15:59:32.064
org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[230]
- handle events 85 succeeded.
[INFO] 2023-01-17 15:59:43.249
org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[68] -
failover execute started
[INFO] 2023-01-17 15:59:43.262
org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[74] -
need failover hosts:[10.66.76.129:5678]
[INFO] 2023-01-17 15:59:43.317
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[430] -
start master[10.66.76.129:5678] failover, process list size:1
[INFO] 2023-01-17 15:59:43.330
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[462] -
master[10.66.76.129:5678] failover end, useTime:37ms
[INFO] 2023-01-17 15:59:49.665
org.apache.dolphinscheduler.server.master.processor.TaskResponseProcessor:[72]
- received command : TaskExecuteResponseCommand{taskInstanceId=108,
processInstanceId=85, status=6, startTime=null, endTime=Tue Jan 17 15:59:49 CST
2023, host=null, logPath=null, executePath=null, processId=16894, appIds='',
varPool=[]}
[INFO] 2023-01-17 15:59:50.004
org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[223]
- already exists handler process size:0
[INFO] 2023-01-17 15:59:50.061
org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[230]
- handle events 85 succeeded.
[INFO] 2023-01-17 15:59:50.983
org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[127] -
handle process instance : 85 , events count:1
[INFO] 2023-01-17 15:59:50.984
org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[130] -
already exists handler process size:0
[INFO] 2023-01-17 15:59:50.986
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[314] -
process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus:
FAILURE task instance id: 108 process instance id: 85 context: null
[INFO] 2023-01-17 15:59:50.992
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[428] -
work flow 85 task 108 state:FAILURE
[INFO] 2023-01-17 15:59:51.031
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1299] -
work flow process instance [id: 85, name:test_retry-3-20230117155929472], state
change from RUNNING_EXECUTION to FAILURE, cmd type: START_PROCESS
[INFO] 2023-01-17 15:59:51.048
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[501] -
process:85 state FAILURE change to FAILURE
[INFO] 2023-01-17 15:59:51.150
org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[139] -
process instance 85 finished.
[INFO] 2023-01-17 15:59:52.013
org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[219]
- remove process instance: 85
[INFO] 2023-01-17 15:59:52.622
org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[239] -
worker group node : /nodes/worker/default/10.66.76.129:1234 down.
[INFO] 2023-01-17 15:59:52.622
org.apache.dolphinscheduler.server.master.registry.MasterRegistryDataListener:[80]
- worker node deleted : /nodes/worker/default/10.66.76.129:1234
[INFO] 2023-01-17 15:59:52.633
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[196] -
WORKER node deleted : /nodes/worker/default/10.66.76.129:1234
[INFO] 2023-01-17 15:59:52.635
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[206] -
path: /nodes/worker/default/10.66.76.129:1234 not exists
[INFO] 2023-01-17 15:59:52.644
org.apache.dolphinscheduler.service.registry.RegistryClient:[164] - WORKER
server dead , and /nodes/worker/default/10.66.76.129:1234 added to zk dead
server path success
[INFO] 2023-01-17 15:59:52.694
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[380] -
start worker[10.66.76.129:1234] failover, task list size:0
[INFO] 2023-01-17 15:59:52.695
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[406] -
end worker[10.66.76.129:1234] failover, useTime:40ms
[INFO] 2023-01-17 16:00:43.334
org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[68] -
failover execute started
```
here is worker log:
```
[INFO] 2023-01-17 15:59:30.624
org.apache.dolphinscheduler.server.worker.processor.TaskExecuteProcessor:[108]
- received command :
TaskExecuteRequestCommand{taskExecutionContext='{"taskInstanceId":108,"taskName":"test_exit","firstSubmitTime":"2023-01-17
15:59:29","startTime":null,"endTime":null,"taskType":"SHELL","host":null,"executePath":null,"logPath":null,"taskJson":null,"processId":0,"processDefineCode":8256494422080,"processDefineVersion":3,"appIds":null,"processInstanceId":85,"scheduleTime":null,"globalParams":null,"executorId":1,"cmdTypeIfComplement":0,"tenantCode":"molin","queue":"default","projectCode":7960944102080,"taskParams":"{\"resourceList\":[],\"localParams\":[],\"rawScript\":\"sleep
120\",\"dependence\":{},\"conditionResult\":{\"successNode\":[],\"failedNode\":[]},\"waitStartTimeout\":{},\"switchResult\":{}}","envFile":null,"environmentConfig":null,"definedParams":null,"taskAppId":null,"taskTimeoutStrategy":null,"taskTimeout":2147483647,"workerGroup":"default","de
layTime":0,"currentExecutionStatus":null,"resources":{},"sqlTaskExecutionContext":{"warningGroupId":0,"connectionParams":null,"udfFuncTenantCodeMap":null,"defaultFS":null},"dataxTaskExecutionContext":{"dataSourceId":0,"sourcetype":0,"sourceConnectionParams":null,"dataTargetId":0,"targetType":0,"targetConnectionParams":null},"dependenceTaskExecutionContext":null,"sqoopTaskExecutionContext":{"dataSourceId":0,"sourcetype":0,"sourceConnectionParams":null,"dataTargetId":0,"targetType":0,"targetConnectionParams":null},"varPool":null,"dryRun":0,"paramsMap":null,"procedureTaskExecutionContext":{"connectionParams":null}}'}
[INFO] 2023-01-17 15:59:30.881
org.apache.dolphinscheduler.common.utils.FileUtils:[121] - create dir success
/tmp/dolphinscheduler/exec/process/7960944102080/8256494422080_3/85/108
[INFO] 2023-01-17 15:59:30.881
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[129] - task
instance local execute path :
/tmp/dolphinscheduler/exec/process/7960944102080/8256494422080_3/85/108
[INFO] 2023-01-17 15:59:30.978
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[146] - the
task begins to execute. task instance id: 108
[INFO] 2023-01-17 15:59:31.029 TaskLogLogger-class
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[83] - shell task
params {"resourceList":[],"localParams":[],"rawScript":"sleep
120","dependence":{},"conditionResult":{"successNode":[],"failedNode":[]},"waitStartTimeout":{},"switchResult":{}}
[INFO] 2023-01-17 15:59:31.117 TaskLogLogger-class
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[137] - raw script :
sleep 120
[INFO] 2023-01-17 15:59:31.117 TaskLogLogger-class
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[138] - task execute
path : /tmp/dolphinscheduler/exec/process/7960944102080/8256494422080_3/85/108
[INFO] 2023-01-17 15:59:31.125 TaskLogLogger-class
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[86] - tenantCode
user:molin, task dir:85_108
[INFO] 2023-01-17 15:59:31.126 TaskLogLogger-class
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[91] - create command
file:/tmp/dolphinscheduler/exec/process/7960944102080/8256494422080_3/85/108/85_108.command
[INFO] 2023-01-17 15:59:31.126 TaskLogLogger-class
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[117] - command :
#!/bin/sh
BASEDIR=$(cd `dirname $0`; pwd)
cd $BASEDIR
source /etc/profile
/tmp/dolphinscheduler/exec/process/7960944102080/8256494422080_3/85/108/85_108_node.sh
[INFO] 2023-01-17 15:59:31.143 TaskLogLogger-class
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[287] - task run
command: sudo -u molin sh
/tmp/dolphinscheduler/exec/process/7960944102080/8256494422080_3/85/108/85_108.command
[INFO] 2023-01-17 15:59:31.146 TaskLogLogger-class
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[178] - process start,
process id is: 16894
[INFO] 2023-01-17 15:59:32.146 TaskLogLogger-class
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[66] - -> welcome to
use bigdata scheduling system...
Disconnected from the target VM, address: '127.0.0.1:56983', transport:
'socket'
[INFO] 2023-01-17 15:59:49.578 TaskLogLogger-class
org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[202] - process has
exited, execute
path:/tmp/dolphinscheduler/exec/process/7960944102080/8256494422080_3/85/108,
processId:16894 ,exitStatusCode:130 ,processWaitForStatus:true
,processExitValue:130
[INFO] 2023-01-17 15:59:49.600
org.apache.dolphinscheduler.server.worker.WorkerServer:[193] - worker server is
stopping ..., cause : shutdownHook
[INFO] 2023-01-17 15:59:49.657
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[213] - task
instance id : 108,task final status : FAILURE
[INFO] 2023-01-17 15:59:49.664
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[276] -
develop mode is: false
[INFO] 2023-01-17 15:59:49.673
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[294] - exec
local path:
/tmp/dolphinscheduler/exec/process/7960944102080/8256494422080_3/85/108 cleared.
[INFO] 2023-01-17 15:59:49.775 org.quartz.core.QuartzScheduler:[585] -
Scheduler quartzScheduler_$_NON_CLUSTERED paused.
[INFO] 2023-01-17 15:59:49.785
org.springframework.scheduling.quartz.SchedulerFactoryBean:[845] - Shutting
down Quartz Scheduler
[INFO] 2023-01-17 15:59:49.790 org.quartz.core.QuartzScheduler:[666] -
Scheduler quartzScheduler_$_NON_CLUSTERED shutting down.
[INFO] 2023-01-17 15:59:49.791 org.quartz.core.QuartzScheduler:[585] -
Scheduler quartzScheduler_$_NON_CLUSTERED paused.
[INFO] 2023-01-17 15:59:49.793 org.quartz.core.QuartzScheduler:[740] -
Scheduler quartzScheduler_$_NON_CLUSTERED shutdown complete.
[INFO] 2023-01-17 15:59:49.812 com.zaxxer.hikari.HikariDataSource:[350] -
DolphinScheduler - Shutdown initiated...
[INFO] 2023-01-17 15:59:49.864 com.zaxxer.hikari.HikariDataSource:[352] -
DolphinScheduler - Shutdown completed.
Process finished with exit code 0
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]