[
https://issues.apache.org/jira/browse/ZOOKEEPER-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645913#comment-16645913
]
Aishwarya Soni edited comment on ZOOKEEPER-3036 at 10/11/18 3:41 AM:
---------------------------------------------------------------------
We got the same issue a couple of days back. We are running zookeeper in a
containerized AWS environment and we had to restart the problem container to
get the above issue resolved. The issue comes when the port binding doesn't
happen. When the container becomes unhealthy, it doesn't release the port and
when it tries to bind to that port to join the quorum, as the port was already
in use and never released, it throws the exception of *Unexpected exception
causing shutdown while sock still open*
This is where the binding happens, QuorumCnxManager class in zookeeper,
*ss.socket().bind(new InetSocketAddress(port));*
In LearnerHandler.java class, it tries to access the port and as the port is
still being used, it throws the exception**
*if (sock != null && !sock.isClosed()) {LOG.error("Unexpected exception causing
shutdown while sock "+ "still open", e);*
Most of the cases, the port might not be null.
was (Author: [email protected]):
We got the same issue a couple of days back. We are running zookeeper in a
containerized AWS environment and we had to restart the problem container to
get the above issue resolved. The issue comes when the port binding doesn't
happen. When the container becomes unhealthy, it doesn't release the port and
when it tries to bind to that port to join the quorum, as the port was already
in use and never released, it throws the exception of *Unexpected exception
causing shutdown while sock still open*
> Unexpected exception in zookeeper
> ---------------------------------
>
> Key: ZOOKEEPER-3036
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3036
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum, server
> Affects Versions: 3.4.10
> Environment: 3 Zookeepers, 5 kafka servers
> Reporter: Oded
> Priority: Critical
>
> We got an issue with one of the zookeeprs (Leader), causing the entire kafka
> cluster to fail:
> 2018-05-09 02:29:01,730 [myid:3] - ERROR
> [LearnerHandler-/192.168.0.91:42490:LearnerHandler@648] - Unexpected
> exception causing shutdown while sock still open
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
> at
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559)
> 2018-05-09 02:29:01,730 [myid:3] - WARN
> [LearnerHandler-/192.168.0.91:42490:LearnerHandler@661] - ******* GOODBYE
> /192.168.0.91:42490 ********
>
> We would expect that zookeeper will choose another Leader and the Kafka
> cluster will continue to work as expected, but that was not the case.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)