symat commented on a change in pull request #1228: ZOOKEEPER-3698: fixing 
NoRouteToHostException when starting large cluster locally
URL: https://github.com/apache/zookeeper/pull/1228#discussion_r368575308
 
 

 ##########
 File path: zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md
 ##########
 @@ -1542,6 +1555,22 @@ the variable does.
     ZAB protocol and the Fast Leader Election protocol. Default
     value is **false**.
 
+* *multiAddress.reachabilityCheckEnabled* :
+    (Java system property: **zookeeper.multiAddress.reachabilityCheckEnabled**)
+    **New in 3.6.0:**
+    Since ZooKeeper 3.6.0 you can also [specify multiple 
addresses](#id_multi_address) 
+    for each ZooKeeper server instance (this can increase availability when 
multiple physical 
+    network interfaces can be used parallel in the cluster). ZooKeeper will 
perform ICMP ECHO requests
+    or try to establish a TCP connection on port 7 (Echo) of the destination 
host in order to find 
+    the reachable addresses. This happens only if you provide multiple 
addresses in the configuration.
+    The reachable check can fail if you hit some ICMP rate-limitation, (e.g. 
on MacOS) when you try to 
+    start a large (e.g. 11+) ensemble members cluster on a single machine for 
testing. 
+    
+    Default value is **true**. By setting this parameter to 'false' you can 
disable the reachability checks. 
+    Please note, disabling the reachability check will cause the cluster not 
to be able to reconfigure 
+    itself properly during network problems, so the disabling is advised only 
during testing. 
 
 Review comment:
   Thanks for checking! :)
   
   The whole purpose of the multi-address feature is to always try to use an 
address which works. The current implementation is (in case of the leader 
election) always filters the address list using `InetAddress.isReachable()` 
calls to find out which is the working server address. This will cause ICMP 
calls (or TCP connections on port 7 (Echo) of the destination host), depending 
on the native implementation (see: 
https://docs.oracle.com/javase/7/docs/api/java/net/InetAddress.html#isReachable(int)
 )
   
   So if the `InetAddress.isReachable` can not reach the host, then the current 
multi-address feature will not able to take the given address as a working one. 
Basically right now it can not distinguish between the case of a broken network 
link (when the whole node is unreachable) and the case of a disabled ICMP (when 
only the ICMP port and the port 7 is disabled in the firewall of the 
destination host). I am not an expert in cluster / firewall operation, so I can 
not tell how serious is this limitation.
   
   One way to improve this could be to implement something like the `ruok` 4LW 
command for the server ports. Some simple request-response messages that only 
shows that the server is alive and listen on the given election / quorum port. 
Then we could use that instead of the ICMP calls. I think this would be a 
reasonable improvement, but maybe more like a separate task, out of the scope 
of 3.6.0.
   
   What do you think?
   
   (also: do you think I should extend the documentation, or you just wanted to 
elaborate here in the PR?)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to