xiaolailong commented on issue #13913:
URL: 
https://github.com/apache/dolphinscheduler/issues/13913#issuecomment-1818989174

   @Radeity 
   Hi. As you mentioned in the above, when reconnected happen, the mater can 
not find its self because its heartbeat information is set to empty in zk. I 
can not reproduce this bug, and as I see, in the MasterHeartBeatTask.java, the 
heartbeat information will update every 10s, so it is not keep empty all the 
time. 
   I also get this bug in production environment, so I try to reproduce but I 
failed. can you give me some help, Thanks! 
   
   > Hi, @minyk , in `MasterConnectionStateListener` of version 3.0.x, when the 
connection state change to `RECONNECTED`, master node will be removed and 
create new one.
   > 
   > 
https://github.com/apache/dolphinscheduler/blob/565bc978eac5a72a073848b440d75b6367b4ad0e/dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/registry/MasterConnectionStateListener.java#L50-L54
   > 
   > 
   > However, when creating new ephemeral node, we don't set heartBeat json as 
its initial value like
   > ```java
   > registryClient.persistEphemeral(masterRegistryPath, 
JSONUtils.toJsonString(masterHeartBeatTask.getHeartBeat()));
   > ```
   > 
   > Information of master nodes will only be updated when handling node add 
and remove event in `ServerNodeManager`
   > 
   > 
https://github.com/apache/dolphinscheduler/blob/565bc978eac5a72a073848b440d75b6367b4ad0e/dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/registry/ServerNodeManager.java#L313-L329
   > 
   > **In `getServerList` of 3.0.x version, if we don't get heartBeat info, we 
will skip this node.**
   > 
   > 
https://github.com/apache/dolphinscheduler/blob/565bc978eac5a72a073848b440d75b6367b4ad0e/dolphinscheduler-service/src/main/java/org/apache/dolphinscheduler/service/registry/RegistryClient.java#L94-L103
   > 
   > Thus, when master2 execute `syncMasterNodes`, it can not find itself in 
`masterPriorityQueue`. Information of master node will not be updated any more, 
so it will keep writing warning message in master2.
   > 
   > 
https://github.com/apache/dolphinscheduler/blob/565bc978eac5a72a073848b440d75b6367b4ad0e/dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/registry/ServerNodeManager.java#L356-L363
   > 
   > You can try to update your DS version to 3.1.x, we provide stop/waiting 
strategy, this bug doesn't exist :D
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to