Bulls1986 commented on pull request #6697: URL: https://github.com/apache/dolphinscheduler/pull/6697#issuecomment-962426641
On k8s, the worker always uses the same svc address as the host to register to zk. When the job restarts, although the zk listener is triggered, it will register the same address. At this time, the channel buffer of the communication layer has not been cleared. There are two places to actually clean up the corresponding host address. The first is that the getChannel method determines that it is not active. It will create a resolution ip based on the domain name, then establish a channel connection to this address, and finally overwrite the same key in the cache (the key here is the hostname); the second is to use the exceptionCaught method of the netty channel to clean up, because the channel is used here remoteAddress clears the cache, obtains the ip and port through remoteAddress, and then clears the above from the channel cache through the assembled address (note: the ip address may not be able to clear the channel with the hostname as the key), which causes the channel in the channel cache to be invalid. Then it seems to be a more effective way to clean up invalid channels in the cache when the error exceeds the limit, so it can also prevent the next schedule from acquiring the problematic channel. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
