inventertom opened a new issue, #14724: URL: https://github.com/apache/dolphinscheduler/issues/14724
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues. ### What happened The master node has been down, and after each restart, it will be down in about 20 minutes master log [INFO] 2023-08-09 12:44:37.053 +0800 org.apache.dolphinscheduler.server.master.event.WorkflowEventQueue:[38] - [WorkflowInstance-0][TaskInstance-0] - Added workflow event to workflowEvent queue, event: WorkflowEvent(workflowEventType=START_WORKFLOW, workflowInstanceId=3869189) [INFO] 2023-08-09 12:44:37.716 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteResponseProcessor:[69] - [WorkflowInstance-4095180][TaskInstance-10025740] - Received task execute result, event: TaskEvent(taskInstanceId=10025740, workerAddress=10.2.236.26:1234, state=SUCCESS, startTime=Wed Aug 09 12:44:33 CST 2023, endTime=Wed Aug 09 12:44:37 CST 2023, executePath=/data1/dolphinscheduler/exec/process/8634070405248/8634164153728_8/4095180/10025740, logPath=/opt/dolphinscheduler/worker-server/logs/20230809/8634164153728_8-4095180-10025740.log, processId=210062, appIds=, event=RESULT, varPool=[], channel=[id: 0xed8d5e82, L:/10.2.236.22:5678 - R:/10.2.236.26:47650], processInstanceId=4095180) [INFO] 2023-08-09 12:44:37.717 +0800 org.apache.dolphinscheduler.server.master.processor.queue.TaskEventService:[126] - [WorkflowInstance-0][TaskInstance-0] - StateEventResponseWorker stopped [INFO] 2023-08-09 12:44:38.258 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteResponseProcessor:[69] - [WorkflowInstance-4095180][TaskInstance-10025743] - Received task execute result, event: TaskEvent(taskInstanceId=10025743, workerAddress=10.2.236.22:1234, state=SUCCESS, startTime=Wed Aug 09 12:44:33 CST 2023, endTime=Wed Aug 09 12:44:38 CST 2023, executePath=/data1/dolphinscheduler/exec/process/8634070405248/8634164153728_8/4095180/10025743, logPath=/opt/dolphinscheduler/worker-server/logs/20230809/8634164153728_8-4095180-10025743.log, processId=134273, appIds=, event=RESULT, varPool=[], channel=[id: 0x674aa614, L:/10.2.236.22:5678 - R:/10.2.236.22:58110], processInstanceId=4095180) [INFO] 2023-08-09 12:44:38.740 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteResponseProcessor:[69] - [WorkflowInstance-4095180][TaskInstance-10025741] - Received task execute result, event: TaskEvent(taskInstanceId=10025741, workerAddress=10.2.236.21:1234, state=SUCCESS, startTime=Wed Aug 09 12:44:33 CST 2023, endTime=Wed Aug 09 12:44:38 CST 2023, executePath=/data1/dolphinscheduler/exec/process/8634070405248/8634164153728_8/4095180/10025741, logPath=/opt/dolphinscheduler/worker-server/logs/20230809/8634164153728_8-4095180-10025741.log, processId=20121, appIds=, event=RESULT, varPool=[], channel=[id: 0xd4b9b54d, L:/10.2.236.22:5678 - R:/10.2.236.21:56094], processInstanceId=4095180) [INFO] 2023-08-09 12:44:39.084 +0800 org.apache.dolphinscheduler.server.master.processor.TaskExecuteResponseProcessor:[69] - [WorkflowInstance-4095180][TaskInstance-10025742] - Received task execute result, event: TaskEvent(taskInstanceId=10025742, workerAddress=10.2.236.98:1234, state=SUCCESS, startTime=Wed Aug 09 12:44:33 CST 2023, endTime=Wed Aug 09 12:44:39 CST 2023, executePath=/data1/dolphinscheduler/exec/process/8634070405248/8634164153728_8/4095180/10025742, logPath=/opt/dolphinscheduler/worker-server/logs/20230809/8634164153728_8-4095180-10025742.log, processId=243081, appIds=, event=RESULT, varPool=[], channel=[id: 0x065b3589, L:/10.2.236.22:5678 - R:/10.2.236.25:37772], processInstanceId=4095180) [INFO] 2023-08-09 12:44:40.053 +0800 org.apache.dolphinscheduler.server.master.MasterServer:[133] - [WorkflowInstance-0][TaskInstance-0] - Master server is stopping, current cause : i was judged to death, release resources and stop myself [INFO] 2023-08-09 12:44:40.071 +0800 org.quartz.core.QuartzScheduler:[585] - [WorkflowInstance-0][TaskInstance-0] - Scheduler DolphinScheduler_$_bigdata021691554605055 paused. [INFO] 2023-08-09 12:44:40.113 +0800 org.eclipse.jetty.server.AbstractConnector:[381] - [WorkflowInstance-0][TaskInstance-0] - Stopped ServerConnector@6986f93e{HTTP/1.1, (http/1.1)}{0.0.0.0:5679} [INFO] 2023-08-09 12:44:40.114 +0800 org.eclipse.jetty.server.session:[149] - [WorkflowInstance-0][TaskInstance-0] - node0 Stopped scavenging [INFO] 2023-08-09 12:44:40.126 +0800 org.eclipse.jetty.server.handler.ContextHandler.application:[2347] - [WorkflowInstance-0][TaskInstance-0] - Destroying Spring FrameworkServlet 'dispatcherServlet' [INFO] 2023-08-09 12:44:40.129 +0800 org.eclipse.jetty.server.handler.ContextHandler:[1153] - [WorkflowInstance-0][TaskInstance-0] - Stopped o.s.b.w.e.j.JettyEmbeddedWebAppContext@2f8c4fae{application,/,[file:///tmp/jetty-docbase.5679.5733904082162478176/],STOPPED} [INFO] 2023-08-09 12:44:40.169 +0800 org.apache.dolphinscheduler.server.master.rpc.MasterRPCServer:[109] - [WorkflowInstance-0][TaskInstance-0] - Closing Master RPC Server... [INFO] 2023-08-09 12:44:40.200 +0800 org.apache.dolphinscheduler.remote.NettyRemotingServer:[212] - [WorkflowInstance-0][TaskInstance-0] - netty server closed [INFO] 2023-08-09 12:44:40.201 +0800 org.apache.dolphinscheduler.server.master.rpc.MasterRPCServer:[111] - [WorkflowInstance-0][TaskInstance-0] - Closed Master RPC Server... [INFO] 2023-08-09 12:44:40.202 +0800 org.springframework.scheduling.quartz.SchedulerFactoryBean:[845] - [WorkflowInstance-0][TaskInstance-0] - Shutting down Quartz Scheduler [WARN] 2023-08-09 12:44:40.202 +0800 org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService:[125] - [WorkflowInstance-0][TaskInstance-0] - State event loop service interrupted, will stop this loop java.lang.InterruptedException: null at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService$StateEventResponseWorker.run(StateEventResponseService.java:121) [INFO] 2023-08-09 12:44:40.202 +0800 org.quartz.core.QuartzScheduler:[666] - [WorkflowInstance-0][TaskInstance-0] - Scheduler DolphinScheduler_$_bigdata021691554605055 shutting down. [INFO] 2023-08-09 12:44:40.202 +0800 org.quartz.core.QuartzScheduler:[585] - [WorkflowInstance-0][TaskInstance-0] - Scheduler DolphinScheduler_$_bigdata021691554605055 paused. [INFO] 2023-08-09 12:44:40.203 +0800 org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService:[132] - [WorkflowInstance-0][TaskInstance-0] - State event loop service stopped [INFO] 2023-08-09 12:44:40.222 +0800 org.quartz.core.QuartzScheduler:[740] - [WorkflowInstance-0][TaskInstance-0] - Scheduler DolphinScheduler_$_bigdata021691554605055 shutdown complete. [INFO] 2023-08-09 12:44:40.222 +0800 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap:[117] - [WorkflowInstance-0][TaskInstance-0] - Master schedule bootstrap stopping... [INFO] 2023-08-09 12:44:40.223 +0800 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap:[118] - [WorkflowInstance-0][TaskInstance-0] - Master schedule bootstrap stopped... [INFO] 2023-08-09 12:44:40.228 +0800 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[226] - [WorkflowInstance-0][TaskInstance-0] - Master node : 10.2.236.22:5678 unRegistry to register center. [INFO] 2023-08-09 12:44:40.228 +0800 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[228] - [WorkflowInstance-0][TaskInstance-0] - MasterServer heartbeat executor shutdown [INFO] 2023-08-09 12:44:40.232 +0800 org.apache.curator.framework.imps.CuratorFrameworkImpl:[955] - [WorkflowInstance-0][TaskInstance-0] - backgroundOperationsLoop exiting [INFO] 2023-08-09 12:44:40.235 +0800 org.apache.zookeeper.ClientCnxn:[522] - [WorkflowInstance-0][TaskInstance-0] - EventThread shut down for session: 0x101fc7a0c560045 [INFO] 2023-08-09 12:44:40.235 +0800 org.apache.zookeeper.ZooKeeper:[693] - [WorkflowInstance-0][TaskInstance-0] - Session: 0x101fc7a0c560045 closed [INFO] 2023-08-09 12:44:40.236 +0800 org.apache.dolphinscheduler.server.master.processor.queue.TaskExecuteRunnable:[57] - [WorkflowInstance-4095180][TaskInstance-10025740] - Handle task event begin: TaskEvent(taskInstanceId=10025740, workerAddress=10.2.236.26:1234, state=SUCCESS, startTime=Wed Aug 09 12:44:33 CST 2023, endTime=Wed Aug 09 12:44:37 CST 2023, executePath=/data1/dolphinscheduler/exec/process/8634070405248/8634164153728_8/4095180/10025740, logPath=/opt/dolphinscheduler/worker-server/logs/20230809/8634164153728_8-4095180-10025740.log, processId=210062, appIds=, event=RESULT, varPool=[], channel=[id: 0xed8d5e82, L:/10.2.236.22:5678 ! R:/10.2.236.26:47650], processInstanceId=4095180) [INFO] 2023-08-09 12:44:40.247 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThreadPool:[98] - [WorkflowInstance-4095180][TaskInstance-10025740] - Submit state event success, stateEvent: StateEvent(key=null, type=TASK_STATE_CHANGE, executionStatus=SUCCESS, taskInstanceId=10025740, taskCode=0, processInstanceId=4095180, context=null, channel=null) master out 日志 Exception in thread "Master-Server" org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'masterServer': Unsatisfied dependency expressed through field 'masterRPCServer'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'masterRPCServer': Invocation of init method failed; nested exception is org.apache.dolphinscheduler.remote.exceptions.RemoteException: NettyRemotingServer bind 5678 fail at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.resolveFieldValue(AutowiredAnnotationBeanPostProcessor.java:659) at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.inject(AutowiredAnnotationBeanPostProcessor.java:639) at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:119) at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessProperties(AutowiredAnnotationBeanPostProcessor.java:399) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1431) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:619) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542) at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335) at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234) at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333) at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208) at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:944) at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:918) at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:583) at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:145) at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:754) at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:434) at org.springframework.boot.SpringApplication.run(SpringApplication.java:338) at org.springframework.boot.SpringApplication.run(SpringApplication.java:1343) at org.springframework.boot.SpringApplication.run(SpringApplication.java:1332) at org.apache.dolphinscheduler.server.master.MasterServer.main(MasterServer.java:78) Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'masterRPCServer': Invocation of init method failed; nested exception is org.apache.dolphinscheduler.remote.exceptions.RemoteException: NettyRemotingServer bind 5678 fail at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:160) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyBeanPostProcessorsBeforeInitialization(AbstractAutowireCapableBeanFactory.java:440) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1796) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:620) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542) at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335) at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234) at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333) at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208) at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:276) at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1380) at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1300) at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.resolveFieldValue(AutowiredAnnotationBeanPostProcessor.java:656) ... 20 more Caused by: org.apache.dolphinscheduler.remote.exceptions.RemoteException: NettyRemotingServer bind 5678 fail at org.apache.dolphinscheduler.remote.NettyRemotingServer.start(NettyRemotingServer.java:144) "master-server-bigdata02.out" 61L, 6738C This morning suddenly like this, please ask what is the reason, how to solve it ### What you expected to happen Master is stable and does not down ### How to reproduce Master Down After the machine restarts, it will be like this in 20 minutes ### Anything else _No response_ ### Version dev ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
