[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460650#comment-17460650
 ] 

Denis Gorobets commented on ZOOKEEPER-4371:
-------------------------------------------

I have a 3-node cluster with IDs 1, 2, and 3. For example, 1 is the leader. 
Zookeeper version: 3.6.3:
 # I shut down node 1
 # Nodes 2 and 3 do сhoosing a leader
 # Node 3 stay new leader
 # ZooKeeper works with 2 nodes
 # I restore node 1
 # I got errors in logs:

node 3 (new leader):
{code:java}
We got a connection request from a server with our own ID. This should be 
either a configuration error, or a bug. {code}
 

node 1 (old leader):
{code:java}
2021-12-15 08:10:05,983 [myid:1] - INFO 
[QuorumConnectionThread-[myid=1]-31:QuorumCnxManager@513] - Have smaller server 
identifier, so dropping the connection: (myId:1 --> sid:2)
2021-12-15 08:10:05,988 [myid:1] - INFO 
[QuorumConnectionThread-[myid=1]-32:QuorumCnxManager@513] - Have smaller server 
identifier, so dropping the connection: (myId:1 --> sid:3) {code}
After I restart the leader (node 3), it works.

> False ID conflict when ZK try to connect to cluster
> ---------------------------------------------------
>
>                 Key: ZOOKEEPER-4371
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4371
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.6.2
>         Environment: We are on Zookeeper 3.6.2 on Docker with official image. 
> It's can be reproduce with something like
>  
> {code:java}
> docker run --add-host=zk_fqdn:zk_ip --ulimit nofile=64000:64000 -p 
> ip_zk:2181:2181 -p ip_zk:2888:2888 -p ip_zk:3888:3888 -p ip_zk:7000:7000 -p 
> ip_zk:8080:8080 -v /data/zookeeper/data:/data -v /data/zookeeper/log:/datalog 
> --hostname "zk_fqdn" --env-file "/data/zookeeper/conf/zk.env"--name zookeeper 
> zookeeper:3.6.2
> {code}
> with
> {code:java}
> ZOO_MY_ID=zk_id
> ZOO_INIT_LIMIT=10
> ZOO_SYNC_LIMIT=5
> ZOO_MAX_CLIENT_CNXNS=0
> ZOO_4LW_COMMANDS_WHITELIST=stat,mntr,conf,ruok
> ZOO_STANDALONE_ENABLED=False
> ZOO_CFG_EXTRA=metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
>  metricsProvider.httpPort=7000 metricsProvider.exportJvmInfo=true
> ZOO_SERVERS=server.1=zk1_fqdn:2888:3888;2181 server.2=zk2_fqdn:2888:3888;2181 
> server.3=zk3_fqdn:2888:3888;2181 
> {code}
>  
>            Reporter: Tifenn LE GOFF
>            Priority: Major
>
> Some ZK cannot join cluster after moment
> {code:java}
> echo stat|nc $HOSTNAME 2181
> This ZooKeeper instance is not currently serving requests
> {code}
> We have 3 ZK, zk1 with id1, zk2 with id2 and zk3 with id3.
> ZK2 and ZK3 are already running. When ZK1 connect to ZK, we have
> {code:java}
> 2021-09-07 13:33:09,585 [myid:1] - INFO  
> [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=disabled):FastLeaderElection@979]
>  - Notification time out: 60000
> 2021-09-07 13:33:09,586 [myid:1] - INFO  
> [QuorumConnectionThread-[myid=1]-55:QuorumCnxManager@513] - Have smaller 
> server identifier, so dropping the connection: (myId:1 --> sid:2)
> 2021-09-07 13:33:09,586 [myid:1] - INFO  
> [QuorumConnectionThread-[myid=1]-56:QuorumCnxManager@513] - Have smaller 
> server identifier, so dropping the connection: (myId:1 --> sid:3)
> 2021-09-07 13:33:30,269 [myid:1] - WARN  
> [NIOWorkerThread-1:NIOServerCnxn@373] - Close of session 0x0
> java.io.IOException: ZooKeeperServer not running
>       at 
> org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:544)
>       at 
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:332)
>       at 
> org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
>       at 
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
>       at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>       at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>       at java.base/java.lang.Thread.run(Unknown Source)
> 2021-09-07 13:33:30,941 [myid:1] - WARN  
> [NIOWorkerThread-2:NIOServerCnxn@373] - Close of session 0x0
> java.io.IOException: ZooKeeperServer not running
>       at 
> org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:544)
>       at 
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:332)
>       at 
> org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
>       at 
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
>       at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>       at java.base/java.util.concurrent.ThreadPoolExecutor$W{code}
> and on zk2 (actual leader), we have
> {code:java}
> 2021-09-07 13:33:09,587 [myid:2] - INFO  
> [ListenerHandler-fqdn-zk2/172.17.0.2:3888:QuorumCnxManager$Listener$ListenerHandler@1070]
>  - Received connection request from /ip-zk1:53102
> 2021-09-07 13:33:09,588 [myid:2] - WARN  
> [SendWorker:1:QuorumCnxManager$SendWorker@1281] - Interrupted while waiting 
> for message on queue
> java.lang.InterruptedException
>       at 
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(Unknown
>  Source)
>       at 
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown
>  Source)
>       at 
> org.apache.zookeeper.util.CircularBlockingQueue.poll(CircularBlockingQueue.java:105)
>       at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1446)
>       at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$900(QuorumCnxManager.java:98)
>       at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1270)
> 2021-09-07 13:33:09,588 [myid:2] - WARN  
> [SendWorker:1:QuorumCnxManager$SendWorker@1293] - Send worker leaving thread 
> id 1 my id = 2
> 2021-09-07 13:33:09,588 [myid:2] - WARN  
> [RecvWorker:1:QuorumCnxManager$RecvWorker@1395] - Connection broken for id 1, 
> my id = 2
> java.net.SocketException: Socket closed
>       at java.base/java.net.SocketInputStream.socketRead0(Native Method)
>       at java.base/java.net.SocketInputStream.socketRead(Unknown Source)
>       at java.base/java.net.SocketInputStream.read(Unknown Source)
>       at java.base/java.net.SocketInputStream.read(Unknown Source)
>       at java.base/java.io.BufferedInputStream.fill(Unknown Source)
>       at java.base/java.io.BufferedInputStream.read(Unknown Source)
>       at java.base/java.io.DataInputStream.readInt(Unknown Source)
>       at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1383)
> 2021-09-07 13:33:09,589 [myid:2] - WARN  
> [RecvWorker:1:QuorumCnxManager$RecvWorker@1401] - Interrupting SendWorker 
> thread from RecvWorker. sid: 1. myId: 2
> 2021-09-07 13:33:09,589 [myid:2] - INFO  
> [ListenerHandler-fqdn-zk2/172.17.0.2:3888:QuorumCnxManager$Listener$ListenerHandler@1070]
>  - Received connection request from /172.17.0.2:35380
> 2021-09-07 13:33:09,590 [myid:2] - WARN  
> [ListenerHandler-fqdn-zk2/172.17.0.2:3888:QuorumCnxManager@662] - We got a 
> connection request from a server with our own ID. This should be either a 
> configuration error, or a bug.
> {code}
> If we restart leader, it works. This issue happen very often since we have 
> migrate our ZK services on docker instances.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to