JaayYoung opened a new issue, #13578: URL: https://github.com/apache/dolphinscheduler/issues/13578
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues. ### What happened -0][TaskInstance-0] - Cannot find workflowExecuteThread from cacheManager, event: TaskEvent(taskInstanceId=10292, workerAddress=192.168.57.234:1234, state=SUCCESS, startTime=Wed Jan 18 08:30:01 CST 2023, endTime=Wed Jan 18 08:30:32 CST 2023, executePath=/tmp/dolphinscheduler/exec/process/6699118561920/7892616947584_73/4823/10292, logPath=/opt/dolphinscheduler/dolphinscheduler-bin/worker-server/logs/20230118/7892616947584_73-4823-10292.log, processId=32385, appIds=application_1673441345280_1068, event=RESULT, varPool=[], channel=[id: 0xb8fef7cc, L:/192.168.57.118:5678 - R:/192.168.57.234:43624], processInstanceId=4823) [INFO] 2023-02-16 22:44:53.627 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteRunningProcessor:[58] - [WorkflowInstance-0][TaskInstance-0] - taskExecuteRunningCommand: TaskExecuteRunningCommand(super=BaseCommand(messageSenderAddress=192.168.57.234:1234, messageReceiverAddress=192.168.57.118:5678, messageSendTime=1676558693614), taskInstanceId=10292, processInstanceId=4823, startTime=Wed Jan 18 08:30:01 CST 2023, host=192.168.57.234:1234, status=1, logPath=/opt/dolphinscheduler/dolphinscheduler-bin/worker-server/logs/20230118/7892616947584_73-4823-10292.log, executePath=/tmp/dolphinscheduler/exec/process/6699118561920/7892616947584_73/4823/10292, processId=0, appIds=null) [WARN] 2023-02-16 22:44:53.627 +0800 org.apache.dolphinscheduler.server.master.processor.queue.TaskExecuteThreadPool:[75] - [WorkflowInstance-0][TaskInstance-0] - Cannot find workflowExecuteThread from cacheManager, event: TaskEvent(taskInstanceId=10292, workerAddress=192.168.57.234:1234, state=RUNNING_EXECUTION, startTime=Wed Jan 18 08:30:01 CST 2023, endTime=null, executePath=/tmp/dolphinscheduler/exec/process/6699118561920/7892616947584_73/4823/10292, logPath=/opt/dolphinscheduler/dolphinscheduler-bin/worker-server/logs/20230118/7892616947584_73-4823-10292.log, processId=0, appIds=null, event=RUNNING, varPool=null, channel=[id: 0xb8fef7cc, L:/192.168.57.118:5678 - R:/192.168.57.234:43624], processInstanceId=4823) [INFO] 2023-02-16 22:45:24.631 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteResponseProcessor:[69] - [WorkflowInstance-4823][TaskInstance-10292] - Received task execute result, event: TaskEvent(taskInstanceId=10292, workerAddress=192.168.57.234:1234, state=SUCCESS, startTime=Wed Jan 18 08:30:01 CST 2023, endTime=Wed Jan 18 08:30:32 CST 2023, executePath=/tmp/dolphinscheduler/exec/process/6699118561920/7892616947584_73/4823/10292, logPath=/opt/dolphinscheduler/dolphinscheduler-bin/worker-server/logs/20230118/7892616947584_73-4823-10292.log, processId=32385, appIds=application_1673441345280_1068, event=RESULT, varPool=[], channel=[id: 0xb8fef7cc, L:/192.168.57.118:5678 - R:/192.168.57.234:43624], processInstanceId=4823) [WARN] 2023-02-16 22:45:24.631 +0800 org.apache.dolphinscheduler.server.master.processor.queue.TaskExecuteThreadPool:[75] - [WorkflowInstance-0][TaskInstance-0] - Cannot find workflowExecuteThread from cacheManager, event: TaskEvent(taskInstanceId=10292, workerAddress=192.168.57.234:1234, state=SUCCESS, startTime=Wed Jan 18 08:30:01 CST 2023, endTime=Wed Jan 18 08:30:32 CST 2023, executePath=/tmp/dolphinscheduler/exec/process/6699118561920/7892616947584_73/4823/10292, logPath=/opt/dolphinscheduler/dolphinscheduler-bin/worker-server/logs/20230118/7892616947584_73-4823-10292.log, processId=32385, appIds=application_1673441345280_1068, event=RESULT, varPool=[], channel=[id: 0xb8fef7cc, L:/192.168.57.118:5678 - R:/192.168.57.234:43624], processInstanceId=4823) [INFO] 2023-02-16 22:49:53.663 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteRunningProcessor:[58] - [WorkflowInstance-0][TaskInstance-0] - taskExecuteRunningCommand: TaskExecuteRunningCommand(super=BaseCommand(messageSenderAddress=192.168.57.234:1234, messageReceiverAddress=192.168.57.118:5678, messageSendTime=1676558993645), taskInstanceId=10292, processInstanceId=4823, startTime=Wed Jan 18 08:30:01 CST 2023, host=192.168.57.234:1234, status=1, logPath=/opt/dolphinscheduler/dolphinscheduler-bin/worker-server/logs/20230118/7892616947584_73-4823-10292.log, executePath=/tmp/dolphinscheduler/exec/process/6699118561920/7892616947584_73/4823/10292, processId=0, appIds=null) [WARN] 2023-02-16 22:49:53.663 +0800 org.apache.dolphinscheduler.server.master.processor.queue.TaskExecuteThreadPool:[75] - [WorkflowInstance-0][TaskInstance-0] - Cannot find workflowExecuteThread from cacheManager, event: TaskEvent(taskInstanceId=10292, workerAddress=192.168.57.234:1234, state=RUNNING_EXECUTION, startTime=Wed Jan 18 08:30:01 CST 2023, endTime=null, executePath=/tmp/dolphinscheduler/exec/process/6699118561920/7892616947584_73/4823/10292, logPath=/opt/dolphinscheduler/dolphinscheduler-bin/worker-server/logs/20230118/7892616947584_73-4823-10292.log, processId=0, appIds=null, event=RUNNING, varPool=null, channel=[id: 0xb8fef7cc, L:/192.168.57.118:5678 - R:/192.168.57.234:43624], processInstanceId=4823) [INFO] 2023-02-16 22:50:02.457 +0800 org.quartz.core.QuartzScheduler:[585] - [WorkflowInstance-0][TaskInstance-0] - Scheduler DolphinScheduler_$_sz-bigdata-coordinate-01.189inner.cn1674027093094 paused. [INFO] 2023-02-16 22:50:02.465 +0800 org.eclipse.jetty.server.AbstractConnector:[381] - [WorkflowInstance-0][TaskInstance-0] - Stopped ServerConnector@1f916219{HTTP/1.1, (http/1.1)}{0.0.0.0:5679} [INFO] 2023-02-16 22:50:02.465 +0800 org.eclipse.jetty.server.session:[149] - [WorkflowInstance-0][TaskInstance-0] - node0 Stopped scavenging [INFO] 2023-02-16 22:50:02.466 +0800 org.eclipse.jetty.server.handler.ContextHandler.application:[2347] - [WorkflowInstance-0][TaskInstance-0] - Destroying Spring FrameworkServlet 'dispatcherServlet' [INFO] 2023-02-16 22:50:02.467 +0800 org.eclipse.jetty.server.handler.ContextHandler:[1153] - [WorkflowInstance-0][TaskInstance-0] - Stopped o.s.b.w.e.j.JettyEmbeddedWebAppContext@1068176{application,/,[file:///tmp/jetty-docbase.5679.3555128580963439342/],STOPPED} [INFO] 2023-02-16 22:50:02.470 +0800 org.apache.dolphinscheduler.server.master.rpc.MasterRPCServer:[109] - [WorkflowInstance-0][TaskInstance-0] - Closing Master RPC Server... [INFO] 2023-02-16 22:50:02.471 +0800 org.apache.dolphinscheduler.remote.NettyRemotingServer:[212] - [WorkflowInstance-0][TaskInstance-0] - netty server closed [INFO] 2023-02-16 22:50:02.471 +0800 org.apache.dolphinscheduler.server.master.rpc.MasterRPCServer:[111] - [WorkflowInstance-0][TaskInstance-0] - Closed Master RPC Server... [INFO] 2023-02-16 22:50:02.471 +0800 org.springframework.scheduling.quartz.SchedulerFactoryBean:[845] - [WorkflowInstance-0][TaskInstance-0] - Shutting down Quartz Scheduler [INFO] 2023-02-16 22:50:02.471 +0800 org.quartz.core.QuartzScheduler:[666] - [WorkflowInstance-0][TaskInstance-0] - Scheduler DolphinScheduler_$_sz-bigdata-coordinate-01.189inner.cn1674027093094 shutting down. [INFO] 2023-02-16 22:50:02.471 +0800 org.quartz.core.QuartzScheduler:[585] - [WorkflowInstance-0][TaskInstance-0] - Scheduler DolphinScheduler_$_sz-bigdata-coordinate-01.189inner.cn1674027093094 paused. [WARN] 2023-02-16 22:50:02.472 +0800 org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService:[125] - [WorkflowInstance-0][TaskInstance-0] - State event loop service interrupted, will stop this loop java.lang.InterruptedException: null at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService$StateEventResponseWorker.run(StateEventResponseService.java:121) [INFO] 2023-02-16 22:50:02.472 +0800 org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService:[132] - [WorkflowInstance-0][TaskInstance-0] - State event loop service stopped [INFO] 2023-02-16 22:50:02.491 +0800 org.quartz.core.QuartzScheduler:[740] - [WorkflowInstance-0][TaskInstance-0] - Scheduler DolphinScheduler_$_sz-bigdata-coordinate-01.189inner.cn1674027093094 shutdown complete. [INFO] 2023-02-16 22:50:02.491 +0800 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap:[117] - [WorkflowInstance-0][TaskInstance-0] - Master schedule bootstrap stopping... [INFO] 2023-02-16 22:50:02.491 +0800 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap:[118] - [WorkflowInstance-0][TaskInstance-0] - Master schedule bootstrap stopped... [INFO] 2023-02-16 22:50:02.494 +0800 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[226] - [WorkflowInstance-0][TaskInstance-0] - Master node : 192.168.57.118:5678 unRegistry to register center. [INFO] 2023-02-16 22:50:02.494 +0800 org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[302] - [WorkflowInstance-0][TaskInstance-0] - master node : /nodes/master/192.168.57.118:5678 down. [INFO] 2023-02-16 22:50:02.494 +0800 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[125] - [WorkflowInstance-0][TaskInstance-0] - MASTER node deleted : /nodes/master/192.168.57.118:5678 [INFO] 2023-02-16 22:50:02.496 +0800 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[228] - [WorkflowInstance-0][TaskInstance-0] - MasterServer heartbeat executor shutdown [INFO] 2023-02-16 22:50:02.496 +0800 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[140] - [WorkflowInstance-0][TaskInstance-0] - path: /nodes/master/192.168.57.118:5678 not exists [INFO] 2023-02-16 22:50:02.506 +0800 org.apache.dolphinscheduler.service.registry.RegistryClient:[160] - [WorkflowInstance-0][TaskInstance-0] - MASTER server dead , and /nodes/master/192.168.57.118:5678 added to zk dead server path success [INFO] 2023-02-16 22:50:02.506 +0800 org.apache.dolphinscheduler.server.master.service.FailoverService:[53] - [WorkflowInstance-0][TaskInstance-0] - Master failover starting, masterServer: 192.168.57.118:5678 [WARN] 2023-02-16 22:50:02.506 +0800 org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[362] - [WorkflowInstance-0][TaskInstance-0] - current addr:192.168.57.118:5678 is not in active master list [INFO] 2023-02-16 22:50:02.506 +0800 org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[364] - [WorkflowInstance-0][TaskInstance-0] - update master nodes, master size: 0, slot: 0, addr: 192.168.57.118:5678 [INFO] 2023-02-16 22:50:02.508 +0800 org.apache.curator.framework.imps.CuratorFrameworkImpl:[955] - [WorkflowInstance-0][TaskInstance-0] - backgroundOperationsLoop exiting [ERROR] 2023-02-16 22:50:02.508 +0800 org.apache.dolphinscheduler.server.master.service.MasterFailoverService:[113] - [WorkflowInstance-0][TaskInstance-0] - Master server failover failed, host:192.168.57.118:5678 org.apache.dolphinscheduler.registry.api.RegistryException: zookeeper release lock error at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.acquireLock(ZookeeperRegistry.java:215) at org.apache.dolphinscheduler.service.registry.RegistryClient.getLock(RegistryClient.java:231) at org.apache.dolphinscheduler.server.master.service.MasterFailoverService.failoverMaster(MasterFailoverService.java:110) at org.apache.dolphinscheduler.server.master.service.MasterFailoverService$$FastClassBySpringCGLIB$$479c980c.invoke(<generated>) at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:783) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753) at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:698) at org.apache.dolphinscheduler.server.master.service.MasterFailoverService$$EnhancerBySpringCGLIB$$37b784e3.failoverMaster(<generated>) at org.apache.dolphinscheduler.server.master.service.FailoverService.failoverServerWhenDown(FailoverService.java:54) at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.removeMasterNodePath(MasterRegistryClient.java:147) at org.apache.dolphinscheduler.server.master.registry.MasterRegistryDataListener.handleMasterEvent(MasterRegistryDataListener.java:66) at org.apache.dolphinscheduler.server.master.registry.MasterRegistryDataListener.notify(MasterRegistryDataListener.java:52) at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.lambda$subscribe$1(ZookeeperRegistry.java:128) at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:760) at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:754) at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100) at org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92) at org.apache.curator.framework.recipes.cache.TreeCache.callListeners(TreeCache.java:753) at org.apache.curator.framework.recipes.cache.TreeCache.access$1900(TreeCache.java:75) at org.apache.curator.framework.recipes.cache.TreeCache$4.run(TreeCache.java:865) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Lost connection while trying to acquire lock: /lock/failover/masters/192.168.57.118:5678 at org.apache.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:91) at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.acquireLock(ZookeeperRegistry.java:204) ... 30 common frames omitted [ERROR] 2023-02-16 22:50:02.509 +0800 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[150] - [WorkflowInstance-0][TaskInstance-0] - MASTER server failover failed, host:192.168.57.118:5678 java.lang.NullPointerException: null at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.releaseLock(ZookeeperRegistry.java:222) at org.apache.dolphinscheduler.service.registry.RegistryClient.releaseLock(RegistryClient.java:235) at org.apache.dolphinscheduler.server.master.service.MasterFailoverService.failoverMaster(MasterFailoverService.java:115) at org.apache.dolphinscheduler.server.master.service.MasterFailoverService$$FastClassBySpringCGLIB$$479c980c.invoke(<generated>) at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:783) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753) at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:698) at org.apache.dolphinscheduler.server.master.service.MasterFailoverService$$EnhancerBySpringCGLIB$$37b784e3.failoverMaster(<generated>) at org.apache.dolphinscheduler.server.master.service.FailoverService.failoverServerWhenDown(FailoverService.java:54) at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.removeMasterNodePath(MasterRegistryClient.java:147) at org.apache.dolphinscheduler.server.master.registry.MasterRegistryDataListener.handleMasterEvent(MasterRegistryDataListener.java:66) at org.apache.dolphinscheduler.server.master.registry.MasterRegistryDataListener.notify(MasterRegistryDataListener.java:52) at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.lambda$subscribe$1(ZookeeperRegistry.java:128) at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:760) at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:754) at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100) at org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92) at org.apache.curator.framework.recipes.cache.TreeCache.callListeners(TreeCache.java:753) at org.apache.curator.framework.recipes.cache.TreeCache.access$1900(TreeCache.java:75) at org.apache.curator.framework.recipes.cache.TreeCache$4.run(TreeCache.java:865) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ### What you expected to happen 日志不反复刷新无法获取此任务状态 The task status cannot be obtained without repeatedly refreshing the log ### How to reproduce The task status is successfully, and the log is continuously refreshed. The task status cannot be obtained by the process ### Anything else _No response_ ### Version 3.0.x ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
