q4q5q6qw opened a new issue, #15666:
URL: https://github.com/apache/dolphinscheduler/issues/15666

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   After the system runs properly for a period of time, port 1234 of the worker 
node is disabled,the master is normal, I checked the worker configuration item 
registry-disconnect-strategy.strategy=waiting, 
registry-disconnect-strategy.max-waiting-time=100s by referring to issue:12414, 
and the following error information is displayed in the log:
   
   `[INFO] 2024-03-03 08:07:42.285 +0800 org.apache.zookeeper.ClientCnxn:[1005] 
- [WorkflowInstance-0][TaskInstance-0] - Socket connection established, 
initiating session, client: /10.75.195.147:54462, server: 
10.75.194.59/10.75.194.59:2181
   [WARN] 2024-03-03 08:07:42.286 +0800 org.apache.zookeeper.ClientCnxn:[1292] 
- [WorkflowInstance-0][TaskInstance-0] - Session 0x0 for server 
10.75.194.59/10.75.194.59:2181, Closing socket connection. Attempting reconnect 
except it is a SessionExpiredException.
   org.apache.zookeeper.ClientCnxn$EndOfStreamException: Unable to read 
additional data from server sessionid 0x0, likely server has closed socket
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77)
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1282)
   [INFO] 2024-03-03 08:07:42.856 +0800 
org.apache.dolphinscheduler.common.model.BaseHeartBeatTask:[52] - 
[WorkflowInstance-0][TaskInstance-0] - The current server status is WAITING, 
will not write heartBeatInfo into registry
   [ERROR] 2024-03-03 08:07:43.475 +0800 
org.apache.dolphinscheduler.server.worker.registry.WorkerWaitingStrategy:[80] - 
[WorkflowInstance-0][TaskInstance-0] - Disconnect from registry and change the 
current status to waiting error, the current server state is WAITING, will stop 
the current server
   org.apache.dolphinscheduler.common.lifecycle.ServerLifeCycleException: 
Waiting to reconnect to registry in PT1M40S failed
        at 
org.apache.dolphinscheduler.server.worker.registry.WorkerWaitingStrategy.disconnect(WorkerWaitingStrategy.java:74)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
        at 
org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:208)
        at com.sun.proxy.$Proxy110.disconnect(Unknown Source)
        at 
org.apache.dolphinscheduler.server.worker.registry.WorkerConnectionStateListener.onUpdate(WorkerConnectionStateListener.java:53)
        at 
org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperConnectionStateListener.stateChanged(ZookeeperConnectionStateListener.java:43)
        at 
org.apache.curator.framework.state.ConnectionStateManager.lambda$processEvents$0(ConnectionStateManager.java:281)
        at 
org.apache.curator.framework.listen.MappingListenerManager.lambda$forEach$0(MappingListenerManager.java:92)
        at 
org.apache.curator.framework.listen.MappingListenerManager.forEach(MappingListenerManager.java:89)
        at 
org.apache.curator.framework.listen.StandardListenerManager.forEach(StandardListenerManager.java:89)
        at 
org.apache.curator.framework.state.ConnectionStateManager.processEvents(ConnectionStateManager.java:281)
        at 
org.apache.curator.framework.state.ConnectionStateManager.access$000(ConnectionStateManager.java:43)
        at 
org.apache.curator.framework.state.ConnectionStateManager$1.call(ConnectionStateManager.java:134)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
   Caused by: org.apache.dolphinscheduler.registry.api.RegistryException: 
Cannot connect to the Zookeeper registry in 100 s
        at 
org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.connectUntilTimeout(ZookeeperRegistry.java:133)
        at 
org.apache.dolphinscheduler.registry.api.RegistryClient.connectUntilTimeout(RegistryClient.java:72)
        at 
org.apache.dolphinscheduler.server.worker.registry.WorkerWaitingStrategy.disconnect(WorkerWaitingStrategy.java:71)
        ... 20 common frames omitted
   [INFO] 2024-03-03 08:07:43.668 +0800 org.apache.zookeeper.ClientCnxn:[1171] 
- [WorkflowInstance-0][TaskInstance-0] - Opening socket connection to server 
10.75.195.160/10.75.195.160:2181.
   [INFO] 2024-03-03 08:07:43.668 +0800 org.apache.zookeeper.ClientCnxn:[1173] 
- [WorkflowInstance-0][TaskInstance-0] - SASL config status: Will not attempt 
to authenticate using SASL (unknown error)
   [WARN] 2024-03-03 08:07:43.679 +0800 org.apache.zookeeper.ClientCnxn:[1292] 
- [WorkflowInstance-0][TaskInstance-0] - Session 0x0 for server 
10.75.195.160/10.75.195.160:2181, Closing socket connection. Attempting 
reconnect except it is a SessionExpiredException.
   java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:715)
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:344)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1282)
   [INFO] 2024-03-03 08:07:44.683 +0800 org.apache.zookeeper.ClientCnxn:[1171] 
- [WorkflowInstance-0][TaskInstance-0] - Opening socket connection to server 
10.75.195.174/10.75.195.174:2181.
   [INFO] 2024-03-03 08:07:44.684 +0800 org.apache.zookeeper.ClientCnxn:[1173] 
- [WorkflowInstance-0][TaskInstance-0] - SASL config status: Will not attempt 
to authenticate using SASL (unknown error)
   [INFO] 2024-03-03 08:07:44.684 +0800 org.apache.zookeeper.ClientCnxn:[1005] 
- [WorkflowInstance-0][TaskInstance-0] - Socket connection established, 
initiating session, client: /10.75.195.147:39546, server: 
10.75.195.174/10.75.195.174:2181
   [WARN] 2024-03-03 08:07:44.685 +0800 org.apache.zookeeper.ClientCnxn:[1292] 
- [WorkflowInstance-0][TaskInstance-0] - Session 0x0 for server 
10.75.195.174/10.75.195.174:2181, Closing socket connection. Attempting 
reconnect except it is a SessionExpiredException.
   org.apache.zookeeper.ClientCnxn$EndOfStreamException: Unable to read 
additional data from server sessionid 0x0, likely server has closed socket
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77)
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1282)
   [INFO] 2024-03-03 08:07:45.217 +0800 org.apache.zookeeper.ClientCnxn:[1171] 
- [WorkflowInstance-0][TaskInstance-0] - Opening socket connection to server 
10.75.194.59/10.75.194.59:2181.
   [INFO] 2024-03-03 08:07:45.217 +0800 org.apache.zookeeper.ClientCnxn:[1173] 
- [WorkflowInstance-0][TaskInstance-0] - SASL config status: Will not attempt 
to authenticate using SASL (unknown error)
   [INFO] 2024-03-03 08:07:45.217 +0800 org.apache.zookeeper.ClientCnxn:[1005] 
- [WorkflowInstance-0][TaskInstance-0] - Socket connection established, 
initiating session, client: /10.75.195.147:54470, server: 
10.75.194.59/10.75.194.59:2181
   [WARN] 2024-03-03 08:07:45.218 +0800 org.apache.zookeeper.ClientCnxn:[1292] 
- [WorkflowInstance-0][TaskInstance-0] - Session 0x0 for server 
10.75.194.59/10.75.194.59:2181, Closing socket connection. Attempting reconnect 
except it is a SessionExpiredException.
   org.apache.zookeeper.ClientCnxn$EndOfStreamException: Unable to read 
additional data from server sessionid 0x0, likely server has closed socket
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77)
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1282)
   [INFO] 2024-03-03 08:07:46.479 +0800 
org.apache.dolphinscheduler.server.worker.WorkerServer:[126] - 
[WorkflowInstance-0][TaskInstance-0] - Worker server is stopping, current cause 
: Disconnect from registry and change the current status to waiting error, the 
current server state is WAITING, will stop the current server
   [INFO] 2024-03-03 08:07:46.479 +0800 
org.apache.dolphinscheduler.server.worker.WorkerServer:[146] - 
[WorkflowInstance-0][TaskInstance-0] - Worker begin to kill all cache task, 
task size: 2
   [INFO] 2024-03-03 08:07:46.480 +0800 
org.apache.dolphinscheduler.plugin.task.api.utils.ProcessUtils:[92] - 
[WorkflowInstance-18203][TaskInstance-18204] - Begin kill task instance, 
processId: 0
   [ERROR] 2024-03-03 08:07:46.480 +0800 
org.apache.dolphinscheduler.plugin.task.api.utils.ProcessUtils:[95] - 
[WorkflowInstance-18203][TaskInstance-18204] - Task instance kill failed, 
processId is not exist
   [INFO] 2024-03-03 08:07:46.480 +0800 
org.apache.dolphinscheduler.plugin.task.api.utils.ProcessUtils:[92] - 
[WorkflowInstance-18206][TaskInstance-18206] - Begin kill task instance, 
processId: 0
   [ERROR] 2024-03-03 08:07:46.480 +0800 
org.apache.dolphinscheduler.plugin.task.api.utils.ProcessUtils:[95] - 
[WorkflowInstance-18206][TaskInstance-18206] - Task instance kill failed, 
processId is not exist
   [INFO] 2024-03-03 08:07:46.480 +0800 
org.apache.dolphinscheduler.server.worker.WorkerServer:[159] - 
[WorkflowInstance-0][TaskInstance-0] - Worker after kill all cache task, task 
size: 2, killed number: 0
   [WARN] 2024-03-03 08:07:46.480 +0800 
org.apache.dolphinscheduler.common.model.BaseHeartBeatTask:[72] - 
[WorkflowInstance-0][TaskInstance-0] - WorkerHeartBeatTask finished...
   [INFO] 2024-03-03 08:07:46.515 +0800 
org.apache.curator.framework.imps.CuratorFrameworkImpl:[998] - 
[WorkflowInstance-0][TaskInstance-0] - backgroundOperationsLoop exiting
   [WARN] 2024-03-03 08:07:47.007 +0800 org.apache.zookeeper.ClientCnxn:[1286] 
- [WorkflowInstance-0][TaskInstance-0] - An exception was thrown while closing 
send thread for session 0x0.
   java.io.IOException: Connection has already been closed and reconnection is 
not allowed
        at 
org.apache.zookeeper.ClientCnxn$SendThread.changeZkState(ClientCnxn.java:990)
        at 
org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1141)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1200)
   [INFO] 2024-03-03 08:07:47.116 +0800 org.apache.zookeeper.ClientCnxn:[568] - 
[WorkflowInstance-0][TaskInstance-0] - EventThread shut down for session: 0x0
   [INFO] 2024-03-03 08:07:47.116 +0800 org.apache.zookeeper.ZooKeeper:[1232] - 
[WorkflowInstance-0][TaskInstance-0] - Session: 0x0 closed
   [INFO] 2024-03-03 08:07:47.119 +0800 
org.apache.dolphinscheduler.server.worker.registry.WorkerRegistryClient:[128] - 
[WorkflowInstance-0][TaskInstance-0] - Worker registry client closed
   [INFO] 2024-03-03 08:07:47.119 +0800 
org.apache.dolphinscheduler.server.worker.rpc.WorkerRpcServer:[60] - 
[WorkflowInstance-0][TaskInstance-0] - Worker rpc server closing
   [INFO] 2024-03-03 08:07:47.119 +0800 
org.apache.dolphinscheduler.server.worker.rpc.WorkerRpcServer:[62] - 
[WorkflowInstance-0][TaskInstance-0] - Worker rpc server closed
   [INFO] 2024-03-03 08:07:47.119 +0800 
org.apache.dolphinscheduler.server.worker.WorkerServer:[133] - 
[WorkflowInstance-0][TaskInstance-0] - Worker server stopped, current cause: 
Disconnect from registry and change the current status to waiting error, the 
current server state is WAITING, will stop the current server`
   
   - [ ] 
   
   ### What you expected to happen
   
   Normal scheduling
   
   ### How to reproduce
   
   Reference Log Information
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   3.2.x
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to