inventertom opened a new issue, #14724:
URL: https://github.com/apache/dolphinscheduler/issues/14724

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   The master node has been down, and after each restart, it will be down in 
about 20 minutes
   
   master log
   [INFO] 2023-08-09 12:44:37.053 +0800 
org.apache.dolphinscheduler.server.master.event.WorkflowEventQueue:[38] - 
[WorkflowInstance-0][TaskInstance-0] - Added workflow event to workflowEvent 
queue, event: WorkflowEvent(workflowEventType=START_WORKFLOW, 
workflowInstanceId=3869189)
   [INFO] 2023-08-09 12:44:37.716 +0800 
org.apache.dolphinscheduler.server.master.processor.TaskExecuteResponseProcessor:[69]
 - [WorkflowInstance-4095180][TaskInstance-10025740] - Received task execute 
result, event: TaskEvent(taskInstanceId=10025740, 
workerAddress=10.2.236.26:1234, state=SUCCESS, startTime=Wed Aug 09 12:44:33 
CST 2023, endTime=Wed Aug 09 12:44:37 CST 2023, 
executePath=/data1/dolphinscheduler/exec/process/8634070405248/8634164153728_8/4095180/10025740,
 
logPath=/opt/dolphinscheduler/worker-server/logs/20230809/8634164153728_8-4095180-10025740.log,
 processId=210062, appIds=, event=RESULT, varPool=[], channel=[id: 0xed8d5e82, 
L:/10.2.236.22:5678 - R:/10.2.236.26:47650], processInstanceId=4095180)
   [INFO] 2023-08-09 12:44:37.717 +0800 
org.apache.dolphinscheduler.server.master.processor.queue.TaskEventService:[126]
 - [WorkflowInstance-0][TaskInstance-0] - StateEventResponseWorker stopped
   [INFO] 2023-08-09 12:44:38.258 +0800 
org.apache.dolphinscheduler.server.master.processor.TaskExecuteResponseProcessor:[69]
 - [WorkflowInstance-4095180][TaskInstance-10025743] - Received task execute 
result, event: TaskEvent(taskInstanceId=10025743, 
workerAddress=10.2.236.22:1234, state=SUCCESS, startTime=Wed Aug 09 12:44:33 
CST 2023, endTime=Wed Aug 09 12:44:38 CST 2023, 
executePath=/data1/dolphinscheduler/exec/process/8634070405248/8634164153728_8/4095180/10025743,
 
logPath=/opt/dolphinscheduler/worker-server/logs/20230809/8634164153728_8-4095180-10025743.log,
 processId=134273, appIds=, event=RESULT, varPool=[], channel=[id: 0x674aa614, 
L:/10.2.236.22:5678 - R:/10.2.236.22:58110], processInstanceId=4095180)
   [INFO] 2023-08-09 12:44:38.740 +0800 
org.apache.dolphinscheduler.server.master.processor.TaskExecuteResponseProcessor:[69]
 - [WorkflowInstance-4095180][TaskInstance-10025741] - Received task execute 
result, event: TaskEvent(taskInstanceId=10025741, 
workerAddress=10.2.236.21:1234, state=SUCCESS, startTime=Wed Aug 09 12:44:33 
CST 2023, endTime=Wed Aug 09 12:44:38 CST 2023, 
executePath=/data1/dolphinscheduler/exec/process/8634070405248/8634164153728_8/4095180/10025741,
 
logPath=/opt/dolphinscheduler/worker-server/logs/20230809/8634164153728_8-4095180-10025741.log,
 processId=20121, appIds=, event=RESULT, varPool=[], channel=[id: 0xd4b9b54d, 
L:/10.2.236.22:5678 - R:/10.2.236.21:56094], processInstanceId=4095180)
   [INFO] 2023-08-09 12:44:39.084 +0800 
org.apache.dolphinscheduler.server.master.processor.TaskExecuteResponseProcessor:[69]
 - [WorkflowInstance-4095180][TaskInstance-10025742] - Received task execute 
result, event: TaskEvent(taskInstanceId=10025742, 
workerAddress=10.2.236.98:1234, state=SUCCESS, startTime=Wed Aug 09 12:44:33 
CST 2023, endTime=Wed Aug 09 12:44:39 CST 2023, 
executePath=/data1/dolphinscheduler/exec/process/8634070405248/8634164153728_8/4095180/10025742,
 
logPath=/opt/dolphinscheduler/worker-server/logs/20230809/8634164153728_8-4095180-10025742.log,
 processId=243081, appIds=, event=RESULT, varPool=[], channel=[id: 0x065b3589, 
L:/10.2.236.22:5678 - R:/10.2.236.25:37772], processInstanceId=4095180)
   [INFO] 2023-08-09 12:44:40.053 +0800 
org.apache.dolphinscheduler.server.master.MasterServer:[133] - 
[WorkflowInstance-0][TaskInstance-0] - Master server is stopping, current cause 
: i was judged to death, release resources and stop myself
   [INFO] 2023-08-09 12:44:40.071 +0800 org.quartz.core.QuartzScheduler:[585] - 
[WorkflowInstance-0][TaskInstance-0] - Scheduler 
DolphinScheduler_$_bigdata021691554605055 paused.
   [INFO] 2023-08-09 12:44:40.113 +0800 
org.eclipse.jetty.server.AbstractConnector:[381] - 
[WorkflowInstance-0][TaskInstance-0] - Stopped 
ServerConnector@6986f93e{HTTP/1.1, (http/1.1)}{0.0.0.0:5679}
   [INFO] 2023-08-09 12:44:40.114 +0800 org.eclipse.jetty.server.session:[149] 
- [WorkflowInstance-0][TaskInstance-0] - node0 Stopped scavenging
   [INFO] 2023-08-09 12:44:40.126 +0800 
org.eclipse.jetty.server.handler.ContextHandler.application:[2347] - 
[WorkflowInstance-0][TaskInstance-0] - Destroying Spring FrameworkServlet 
'dispatcherServlet'
   [INFO] 2023-08-09 12:44:40.129 +0800 
org.eclipse.jetty.server.handler.ContextHandler:[1153] - 
[WorkflowInstance-0][TaskInstance-0] - Stopped 
o.s.b.w.e.j.JettyEmbeddedWebAppContext@2f8c4fae{application,/,[file:///tmp/jetty-docbase.5679.5733904082162478176/],STOPPED}
   [INFO] 2023-08-09 12:44:40.169 +0800 
org.apache.dolphinscheduler.server.master.rpc.MasterRPCServer:[109] - 
[WorkflowInstance-0][TaskInstance-0] - Closing Master RPC Server...
   [INFO] 2023-08-09 12:44:40.200 +0800 
org.apache.dolphinscheduler.remote.NettyRemotingServer:[212] - 
[WorkflowInstance-0][TaskInstance-0] - netty server closed
   [INFO] 2023-08-09 12:44:40.201 +0800 
org.apache.dolphinscheduler.server.master.rpc.MasterRPCServer:[111] - 
[WorkflowInstance-0][TaskInstance-0] - Closed Master RPC Server...
   [INFO] 2023-08-09 12:44:40.202 +0800 
org.springframework.scheduling.quartz.SchedulerFactoryBean:[845] - 
[WorkflowInstance-0][TaskInstance-0] - Shutting down Quartz Scheduler
   [WARN] 2023-08-09 12:44:40.202 +0800 
org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService:[125]
 - [WorkflowInstance-0][TaskInstance-0] - State event loop service interrupted, 
will stop this loop
   java.lang.InterruptedException: null
           at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
           at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
           at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
           at 
org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService$StateEventResponseWorker.run(StateEventResponseService.java:121)
   [INFO] 2023-08-09 12:44:40.202 +0800 org.quartz.core.QuartzScheduler:[666] - 
[WorkflowInstance-0][TaskInstance-0] - Scheduler 
DolphinScheduler_$_bigdata021691554605055 shutting down.
   [INFO] 2023-08-09 12:44:40.202 +0800 org.quartz.core.QuartzScheduler:[585] - 
[WorkflowInstance-0][TaskInstance-0] - Scheduler 
DolphinScheduler_$_bigdata021691554605055 paused.
   [INFO] 2023-08-09 12:44:40.203 +0800 
org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService:[132]
 - [WorkflowInstance-0][TaskInstance-0] - State event loop service stopped
   [INFO] 2023-08-09 12:44:40.222 +0800 org.quartz.core.QuartzScheduler:[740] - 
[WorkflowInstance-0][TaskInstance-0] - Scheduler 
DolphinScheduler_$_bigdata021691554605055 shutdown complete.
   [INFO] 2023-08-09 12:44:40.222 +0800 
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap:[117] 
- [WorkflowInstance-0][TaskInstance-0] - Master schedule bootstrap stopping...
   [INFO] 2023-08-09 12:44:40.223 +0800 
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap:[118] 
- [WorkflowInstance-0][TaskInstance-0] - Master schedule bootstrap stopped...
   [INFO] 2023-08-09 12:44:40.228 +0800 
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[226] - 
[WorkflowInstance-0][TaskInstance-0] - Master node : 10.2.236.22:5678 
unRegistry to register center.
   [INFO] 2023-08-09 12:44:40.228 +0800 
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[228] - 
[WorkflowInstance-0][TaskInstance-0] - MasterServer heartbeat executor shutdown
   [INFO] 2023-08-09 12:44:40.232 +0800 
org.apache.curator.framework.imps.CuratorFrameworkImpl:[955] - 
[WorkflowInstance-0][TaskInstance-0] - backgroundOperationsLoop exiting
   [INFO] 2023-08-09 12:44:40.235 +0800 org.apache.zookeeper.ClientCnxn:[522] - 
[WorkflowInstance-0][TaskInstance-0] - EventThread shut down for session: 
0x101fc7a0c560045
   [INFO] 2023-08-09 12:44:40.235 +0800 org.apache.zookeeper.ZooKeeper:[693] - 
[WorkflowInstance-0][TaskInstance-0] - Session: 0x101fc7a0c560045 closed
   [INFO] 2023-08-09 12:44:40.236 +0800 
org.apache.dolphinscheduler.server.master.processor.queue.TaskExecuteRunnable:[57]
 - [WorkflowInstance-4095180][TaskInstance-10025740] - Handle task event begin: 
TaskEvent(taskInstanceId=10025740, workerAddress=10.2.236.26:1234, 
state=SUCCESS, startTime=Wed Aug 09 12:44:33 CST 2023, endTime=Wed Aug 09 
12:44:37 CST 2023, 
executePath=/data1/dolphinscheduler/exec/process/8634070405248/8634164153728_8/4095180/10025740,
 
logPath=/opt/dolphinscheduler/worker-server/logs/20230809/8634164153728_8-4095180-10025740.log,
 processId=210062, appIds=, event=RESULT, varPool=[], channel=[id: 0xed8d5e82, 
L:/10.2.236.22:5678 ! R:/10.2.236.26:47650], processInstanceId=4095180)
   [INFO] 2023-08-09 12:44:40.247 +0800 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThreadPool:[98] 
- [WorkflowInstance-4095180][TaskInstance-10025740] - Submit state event 
success, stateEvent: StateEvent(key=null, type=TASK_STATE_CHANGE, 
executionStatus=SUCCESS, taskInstanceId=10025740, taskCode=0, 
processInstanceId=4095180, context=null, channel=null)
   
   
   
   master out 日志
   Exception in thread "Master-Server" 
org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
creating bean with name 'masterServer': Unsatisfied dependency expressed 
through field 'masterRPCServer'; nested exception is 
org.springframework.beans.factory.BeanCreationException: Error creating bean 
with name 'masterRPCServer': Invocation of init method failed; nested exception 
is org.apache.dolphinscheduler.remote.exceptions.RemoteException: 
NettyRemotingServer bind 5678 fail
           at 
org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.resolveFieldValue(AutowiredAnnotationBeanPostProcessor.java:659)
           at 
org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.inject(AutowiredAnnotationBeanPostProcessor.java:639)
           at 
org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:119)
           at 
org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessProperties(AutowiredAnnotationBeanPostProcessor.java:399)
           at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1431)
           at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:619)
           at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542)
           at 
org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335)
           at 
org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)
           at 
org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333)
           at 
org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208)
           at 
org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:944)
           at 
org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:918)
           at 
org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:583)
           at 
org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:145)
           at 
org.springframework.boot.SpringApplication.refresh(SpringApplication.java:754)
           at 
org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:434)
           at 
org.springframework.boot.SpringApplication.run(SpringApplication.java:338)
           at 
org.springframework.boot.SpringApplication.run(SpringApplication.java:1343)
           at 
org.springframework.boot.SpringApplication.run(SpringApplication.java:1332)
           at 
org.apache.dolphinscheduler.server.master.MasterServer.main(MasterServer.java:78)
   Caused by: org.springframework.beans.factory.BeanCreationException: Error 
creating bean with name 'masterRPCServer': Invocation of init method failed; 
nested exception is 
org.apache.dolphinscheduler.remote.exceptions.RemoteException: 
NettyRemotingServer bind 5678 fail        at 
org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:160)
           at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyBeanPostProcessorsBeforeInitialization(AbstractAutowireCapableBeanFactory.java:440)
           at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1796)
           at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:620)
           at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542)
           at 
org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335)
           at 
org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)
           at 
org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333)
           at 
org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208)
           at 
org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:276)
           at 
org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1380)
           at 
org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1300)
           at 
org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.resolveFieldValue(AutowiredAnnotationBeanPostProcessor.java:656)
           ... 20 more
   Caused by: org.apache.dolphinscheduler.remote.exceptions.RemoteException: 
NettyRemotingServer bind 5678 fail
           at 
org.apache.dolphinscheduler.remote.NettyRemotingServer.start(NettyRemotingServer.java:144)
   "master-server-bigdata02.out" 61L, 6738C
   
   This morning suddenly like this, please ask what is the reason, how to solve 
it
   
   ### What you expected to happen
   
   Master is stable and does not down
   
   ### How to reproduce
   
   Master Down After the machine restarts, it will be like this in 20 minutes
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   dev
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to