Hoang Dang created ZOOKEEPER-3778:
-------------------------------------
Summary: Cannot upgrade from 3.5.7 to 3.6.0 due to
multiAddress.reachabilityCheckEnabled
Key: ZOOKEEPER-3778
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3778
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.6.0
Reporter: Hoang Dang
I upgrade our cluster from 3.5.7 to 3.6.0. I make small change in config for
metricsProvider (prometheus) which I guess won't affect the our cluster's
functions. But we get following error log:
{code:java}
2020-04-01 04:04:57,892 [myid:1] - INFO
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):Follower@292]
- shutdown Follower
2020-04-01 04:04:57,892 [myid:1] - INFO
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@863]
- Peer state changed: looking
2020-04-01 04:04:57,892 [myid:1] - WARN
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@1501]
- PeerState set to LOOKING
2020-04-01 04:04:57,892 [myid:1] - INFO
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@1371]
- LOOKING
2020-04-01 04:04:57,892 [myid:1] - INFO
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):FastLeaderElection@931]
- New election. My id = 1, proposed zxid=0x140000044b
2020-04-01 04:04:57,894 [myid:1] - INFO
[WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] -
Notification: my state:LOOKING; n.sid:1, n.state:LOOKING, n.leader:1, n.round:$
2020-04-01 04:04:57,895 [myid:1] - INFO
[WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] -
Notification: my state:LOOKING; n.sid:2, n.state:FOLLOWING, n.leader:3, n.roun$
2020-04-01 04:04:57,896 [myid:1] - INFO
[WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] -
Notification: my state:LOOKING; n.sid:3, n.state:LEADING, n.leader:3, n.round:$
2020-04-01 04:04:57,896 [myid:1] - INFO
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@857]
- Peer state changed: following
2020-04-01 04:04:57,897 [myid:1] - INFO
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@1453]
- FOLLOWING
2020-04-01 04:04:57,897 [myid:1] - INFO
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):ZooKeeperServer@1246]
- minSessionTimeout set to 4000
2020-04-01 04:04:57,897 [myid:1] - INFO
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):ZooKeeperServer@1255]
- maxSessionTimeout set to 40000
2020-04-01 04:04:57,897 [myid:1] - INFO
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):ResponseCache@45]
- Response cache size is initialized with value 400.
2020-04-01 04:04:57,897 [myid:1] - INFO
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):ResponseCache@45]
- Response cache size is initialized with value 400.
2020-04-01 04:04:57,897 [myid:1] - INFO
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):RequestPathMetricsCollector@111]
- zookeeper.pathStats.slotCapacity = 60
2020-04-01 04:04:57,897 [myid:1] - INFO
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):RequestPathMetricsCollector@112]
- zookeeper.pathStats.slotDuration = 15
2020-04-01 04:04:57,897 [myid:1] - INFO
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):RequestPathMetricsCollector@113]
- zookeeper.pathStats.maxDepth = 6
2020-04-01 04:04:57,897 [myid:1] - INFO
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):RequestPathMetricsCollector@114]
- zookeeper.pathStats.initialDelay = 5
2020-04-01 04:04:57,898 [myid:1] - INFO
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):RequestPathMetricsCollector@115]
- zookeeper.pathStats.delay = 5
2020-04-01 04:04:57,898 [myid:1] - INFO
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):RequestPathMetricsCollector@116]
- zookeeper.pathStats.enabled = false
2020-04-01 04:04:57,898 [myid:1] - INFO
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):ZooKeeperServer@1470]
- The max bytes for all large requests are set to 104857600
2020-04-01 04:04:57,898 [myid:1] - INFO
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):ZooKeeperServer@1484]
- The large request threshold is set to -1
2020-04-01 04:04:57,898 [myid:1] - INFO
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):ZooKeeperServer@329]
- Created server with tickTime 2000 minSessionTimeout 4000 maxSes$
2020-04-01 04:04:57,898 [myid:1] - INFO
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):Follower@75]
- FOLLOWING - LEADER ELECTION TOOK - 5 MS
2020-04-01 04:04:57,899 [myid:1] - INFO
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@863]
- Peer state changed: following - discovery
2020-04-01 04:04:57,900 [myid:1] - WARN
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):Follower@129]
- Exception when following the leader
java.lang.IllegalArgumentException
at
java.base/java.util.concurrent.ThreadPoolExecutor.<init>(ThreadPoolExecutor.java:1295)
at
java.base/java.util.concurrent.ThreadPoolExecutor.<init>(ThreadPoolExecutor.java:1181)
at
java.base/java.util.concurrent.Executors.newFixedThreadPool(Executors.java:92)
at
org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:275)
at
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:87)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1455)
{code}
After checking the code
[here|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Learner.java]
{code:java}
if (self.isMultiAddressReachabilityCheckEnabled()) {
// even if none of the addresses are reachable, we want to try to
establish connection
// see ZOOKEEPER-3758
addresses = multiAddr.getAllReachableAddressesOrAll();
} else {
addresses = multiAddr.getAllAddresses();
}
ExecutorService executor =
Executors.newFixedThreadPool(addresses.size());
{code}
I guess there's something wrong with *multiAddress.reachabilityCheckEnabled*.
So I decide to turn it *off (false)*. After that, I can start our cluster as
expected.
So could you please:
* Update the document [here
|http://zookeeper.apache.org/doc/r3.6.0/zookeeperAdmin.html] for
_multiAddress.reachabilityCheckEnabled_ because it has effect even if
_multiAddress.enabled=false_ (which is default)
* Check the code in Learner.java to make sure _addresses.size()_ is always
larger than 0
--
This message was sent by Atlassian Jira
(v8.3.4#803005)