[
https://issues.apache.org/jira/browse/ZOOKEEPER-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590037#comment-16590037
]
Oded commented on ZOOKEEPER-3036:
---------------------------------
Hope it will help , before we were getting the split brain, we saw this log:
kafka-cluster-zookeeper-1 zookeeper 2018-08-22 20:04:46,623 [myid:2] - INFO
[ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@648] - Got user-level
KeeperException when processing sessionid:0x26561ad5c49000f type:ping
cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
Error Path:null Error:KeeperErrorCode = Session moved
kafka-cluster-zookeeper-1 zookeeper 2018-08-22 20:05:50,885 [myid:2] - INFO
[ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@648] - Got user-level
KeeperException when processing sessionid:0x16561ad5c490007 type:delete
cxid:0xabc zxid:0x100000cbe txntype:-1 reqpath:n/a Error
Path:/config/changes/config_change_0000000001 Error:KeeperErrorCode = NoNode
for /config/changes/config_change_0000000001
> Unexpected exception in zookeeper
> ---------------------------------
>
> Key: ZOOKEEPER-3036
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3036
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum, server
> Affects Versions: 3.4.10
> Environment: 3 Zookeepers, 5 kafka servers
> Reporter: Oded
> Priority: Critical
>
> We got an issue with one of the zookeeprs (Leader), causing the entire kafka
> cluster to fail:
> 2018-05-09 02:29:01,730 [myid:3] - ERROR
> [LearnerHandler-/192.168.0.91:42490:LearnerHandler@648] - Unexpected
> exception causing shutdown while sock still open
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
> at
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559)
> 2018-05-09 02:29:01,730 [myid:3] - WARN
> [LearnerHandler-/192.168.0.91:42490:LearnerHandler@661] - ******* GOODBYE
> /192.168.0.91:42490 ********
>
> We would expect that zookeeper will choose another Leader and the Kafka
> cluster will continue to work as expected, but that was not the case.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)