Bulls1986 commented on pull request #6697:
URL: https://github.com/apache/dolphinscheduler/pull/6697#issuecomment-962426641


   On k8s, the worker always uses the same svc address as the host to register 
to zk. When the job restarts, although the zk listener is triggered, it will 
register the same address. At this time, the channel buffer of the 
communication layer has not been cleared. There are two places to actually 
clean up the corresponding host address. The first is that the getChannel 
method determines that it is not active. It will create a resolution ip based 
on the domain name, then establish a channel connection to this address, and 
finally overwrite the same key in the cache (the key here is the hostname); the 
second is to use the exceptionCaught method of the netty channel to clean up, 
because the channel is used here remoteAddress clears the cache, obtains the ip 
and port through remoteAddress, and then clears the above from the channel 
cache through the assembled address (note: the ip address may not be able to 
clear the channel with the hostname as the key), which causes the channel in 
the 
 channel cache to be invalid. Then it seems to be a more effective way to clean 
up invalid channels in the cache when the error exceeds the limit, so it can 
also prevent the next schedule from acquiring the problematic channel.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to