maxmzkr commented on a change in pull request #13725:
URL: https://github.com/apache/flink/pull/13725#discussion_r509370138
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/leaderretrieval/ZooKeeperLeaderRetrievalService.java
##########
@@ -203,6 +207,11 @@ protected void handleStateChange(ConnectionState newState)
{
}
}
+ private void onReconnectedConnectionState() {
+ // check whether we find some new leader information in
ZooKeeper
+ retrieveLeaderInformationFromZooKeeper();
Review comment:
When this is retrieved, I believe it will be equal to the last value
that the node cache had before disconnect except when the NodeCache has had
time to update the value. On reconnect, the NodeCache sets up an asynchronous
fetch of the value so the NodeCache will usually not have had time to update.
If the leader changes when the zookeeper seession is lost (which will always be
the case if the leader was the one that lost connection), the value that is
retrieved here will likely be wrong. It seems like notifyLossOfLeader was
implemented to prevent Flink nodes from having the wrong or stale leader. Is it
less of a big deal for the Flink node to have the wrong leader in this case
than it is in the case that notifyLossOfLeader is trying to solve?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]