mxm commented on a change in pull request #15675:
URL: https://github.com/apache/flink/pull/15675#discussion_r682601328



##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/leaderretrieval/ZooKeeperLeaderRetrievalDriver.java
##########
@@ -155,9 +155,7 @@ private void handleStateChange(ConnectionState newState) {
                 LOG.debug("Connected to ZooKeeper quorum. Leader retrieval can 
start.");
                 break;
             case SUSPENDED:
-                LOG.warn(
-                        "Connection to ZooKeeper suspended. Can no longer 
retrieve the leader from "
-                                + "ZooKeeper.");
+                LOG.warn("Connection to ZooKeeper suspended, waiting for 
reconnection.");
                 
leaderRetrievalEventHandler.notifyLeaderAddress(LeaderInformation.empty());

Review comment:
       This line needs to be removed, as this will inform the listeners and 
runs their handlers, e.g. trigger a job restart.

##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/util/ZooKeeperUtils.java
##########
@@ -150,6 +151,7 @@ public static CuratorFramework 
startCuratorFramework(Configuration configuration
                         .sessionTimeoutMs(sessionTimeout)
                         .connectionTimeoutMs(connectionTimeout)
                         .retryPolicy(new ExponentialBackoffRetry(retryWait, 
maxRetryAttempts))
+                        .connectionStateErrorPolicy(new 
SessionConnectionStateErrorPolicy())

Review comment:
       For anyone wondering, this will treat SUSPENDED as non-fatal and allow 
the leader to keep its leadership.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to