tillrohrmann commented on a change in pull request #16801:
URL: https://github.com/apache/flink/pull/16801#discussion_r688573729



##########
File path: 
docs/layouts/shortcodes/generated/high_availability_configuration.html
##########
@@ -62,6 +62,12 @@
             <td>Integer</td>
             <td>Defines the session timeout for the ZooKeeper session in 
ms.</td>
         </tr>
+        <tr>
+            
<td><h5>high-availability.zookeeper.client.tolerate-suspended-connections</h5></td>
+            <td style="word-wrap: break-word;">false</td>
+            <td>Boolean</td>
+            <td>Defines whether a suspended ZooKeeper connection will be 
treated as an error that causes the leader information to be invalidated or 
not. In case you set this option to <code 
class="highlighter-rouge">true</code>, Flink will wait until a ZooKeeper 
connection is marked as lost before it revokes the leadership of components. 
This has the effect that Flink is more resilient against temporary connection 
instabilities at the cost of running more likely into timing issues with 
ZooKeeper.</td>

Review comment:
       I quote from the Curator documentation:
   
   > Curator has a pluggable error policy. The default policy takes the 
conservative approach of treating connection states SUSPENDED and LOST the same 
way. i.e. when a recipe sees the state change to SUSPENDED it will assume that 
the ZooKeeper session is lost and will clean up any watchers, nodes, etc. You 
can choose, however, a more aggressive approach by setting the error policy to 
only treat LOST (i.e. true session loss) as an error state.
   
   I guess the risk is that you lose some safety margin for timings between the 
ZK cluster and your client. E.g. ephemeral Znodes will be deleted once a client 
session expires. This effectively starts another round of leader election. If 
now the old leader only revokes leadership upon a lost connection, then it can 
more likely happen that it is a bit late and only revokes the leadership after 
another component has obtained it (disclaimer: I haven't looked into the ZK 
code).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to