[
https://issues.apache.org/jira/browse/ZOOKEEPER-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460650#comment-17460650
]
Denis Gorobets commented on ZOOKEEPER-4371:
-------------------------------------------
I have a 3-node cluster with IDs 1, 2, and 3. For example, 1 is the leader.
Zookeeper version: 3.6.3:
# I shut down node 1
# Nodes 2 and 3 do сhoosing a leader
# Node 3 stay new leader
# ZooKeeper works with 2 nodes
# I restore node 1
# I got errors in logs:
node 3 (new leader):
{code:java}
We got a connection request from a server with our own ID. This should be
either a configuration error, or a bug. {code}
node 1 (old leader):
{code:java}
2021-12-15 08:10:05,983 [myid:1] - INFO
[QuorumConnectionThread-[myid=1]-31:QuorumCnxManager@513] - Have smaller server
identifier, so dropping the connection: (myId:1 --> sid:2)
2021-12-15 08:10:05,988 [myid:1] - INFO
[QuorumConnectionThread-[myid=1]-32:QuorumCnxManager@513] - Have smaller server
identifier, so dropping the connection: (myId:1 --> sid:3) {code}
After I restart the leader (node 3), it works.
> False ID conflict when ZK try to connect to cluster
> ---------------------------------------------------
>
> Key: ZOOKEEPER-4371
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4371
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.6.2
> Environment: We are on Zookeeper 3.6.2 on Docker with official image.
> It's can be reproduce with something like
>
> {code:java}
> docker run --add-host=zk_fqdn:zk_ip --ulimit nofile=64000:64000 -p
> ip_zk:2181:2181 -p ip_zk:2888:2888 -p ip_zk:3888:3888 -p ip_zk:7000:7000 -p
> ip_zk:8080:8080 -v /data/zookeeper/data:/data -v /data/zookeeper/log:/datalog
> --hostname "zk_fqdn" --env-file "/data/zookeeper/conf/zk.env"--name zookeeper
> zookeeper:3.6.2
> {code}
> with
> {code:java}
> ZOO_MY_ID=zk_id
> ZOO_INIT_LIMIT=10
> ZOO_SYNC_LIMIT=5
> ZOO_MAX_CLIENT_CNXNS=0
> ZOO_4LW_COMMANDS_WHITELIST=stat,mntr,conf,ruok
> ZOO_STANDALONE_ENABLED=False
> ZOO_CFG_EXTRA=metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
> metricsProvider.httpPort=7000 metricsProvider.exportJvmInfo=true
> ZOO_SERVERS=server.1=zk1_fqdn:2888:3888;2181 server.2=zk2_fqdn:2888:3888;2181
> server.3=zk3_fqdn:2888:3888;2181
> {code}
>
> Reporter: Tifenn LE GOFF
> Priority: Major
>
> Some ZK cannot join cluster after moment
> {code:java}
> echo stat|nc $HOSTNAME 2181
> This ZooKeeper instance is not currently serving requests
> {code}
> We have 3 ZK, zk1 with id1, zk2 with id2 and zk3 with id3.
> ZK2 and ZK3 are already running. When ZK1 connect to ZK, we have
> {code:java}
> 2021-09-07 13:33:09,585 [myid:1] - INFO
> [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=disabled):FastLeaderElection@979]
> - Notification time out: 60000
> 2021-09-07 13:33:09,586 [myid:1] - INFO
> [QuorumConnectionThread-[myid=1]-55:QuorumCnxManager@513] - Have smaller
> server identifier, so dropping the connection: (myId:1 --> sid:2)
> 2021-09-07 13:33:09,586 [myid:1] - INFO
> [QuorumConnectionThread-[myid=1]-56:QuorumCnxManager@513] - Have smaller
> server identifier, so dropping the connection: (myId:1 --> sid:3)
> 2021-09-07 13:33:30,269 [myid:1] - WARN
> [NIOWorkerThread-1:NIOServerCnxn@373] - Close of session 0x0
> java.io.IOException: ZooKeeperServer not running
> at
> org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:544)
> at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:332)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
> at
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
> at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
> at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
> at java.base/java.lang.Thread.run(Unknown Source)
> 2021-09-07 13:33:30,941 [myid:1] - WARN
> [NIOWorkerThread-2:NIOServerCnxn@373] - Close of session 0x0
> java.io.IOException: ZooKeeperServer not running
> at
> org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:544)
> at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:332)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
> at
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
> at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
> at java.base/java.util.concurrent.ThreadPoolExecutor$W{code}
> and on zk2 (actual leader), we have
> {code:java}
> 2021-09-07 13:33:09,587 [myid:2] - INFO
> [ListenerHandler-fqdn-zk2/172.17.0.2:3888:QuorumCnxManager$Listener$ListenerHandler@1070]
> - Received connection request from /ip-zk1:53102
> 2021-09-07 13:33:09,588 [myid:2] - WARN
> [SendWorker:1:QuorumCnxManager$SendWorker@1281] - Interrupted while waiting
> for message on queue
> java.lang.InterruptedException
> at
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(Unknown
> Source)
> at
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown
> Source)
> at
> org.apache.zookeeper.util.CircularBlockingQueue.poll(CircularBlockingQueue.java:105)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1446)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$900(QuorumCnxManager.java:98)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1270)
> 2021-09-07 13:33:09,588 [myid:2] - WARN
> [SendWorker:1:QuorumCnxManager$SendWorker@1293] - Send worker leaving thread
> id 1 my id = 2
> 2021-09-07 13:33:09,588 [myid:2] - WARN
> [RecvWorker:1:QuorumCnxManager$RecvWorker@1395] - Connection broken for id 1,
> my id = 2
> java.net.SocketException: Socket closed
> at java.base/java.net.SocketInputStream.socketRead0(Native Method)
> at java.base/java.net.SocketInputStream.socketRead(Unknown Source)
> at java.base/java.net.SocketInputStream.read(Unknown Source)
> at java.base/java.net.SocketInputStream.read(Unknown Source)
> at java.base/java.io.BufferedInputStream.fill(Unknown Source)
> at java.base/java.io.BufferedInputStream.read(Unknown Source)
> at java.base/java.io.DataInputStream.readInt(Unknown Source)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1383)
> 2021-09-07 13:33:09,589 [myid:2] - WARN
> [RecvWorker:1:QuorumCnxManager$RecvWorker@1401] - Interrupting SendWorker
> thread from RecvWorker. sid: 1. myId: 2
> 2021-09-07 13:33:09,589 [myid:2] - INFO
> [ListenerHandler-fqdn-zk2/172.17.0.2:3888:QuorumCnxManager$Listener$ListenerHandler@1070]
> - Received connection request from /172.17.0.2:35380
> 2021-09-07 13:33:09,590 [myid:2] - WARN
> [ListenerHandler-fqdn-zk2/172.17.0.2:3888:QuorumCnxManager@662] - We got a
> connection request from a server with our own ID. This should be either a
> configuration error, or a bug.
> {code}
> If we restart leader, it works. This issue happen very often since we have
> migrate our ZK services on docker instances.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)