[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645913#comment-16645913
 ] 

Aishwarya Soni edited comment on ZOOKEEPER-3036 at 10/11/18 3:41 AM:
---------------------------------------------------------------------

We got the same issue a couple of days back. We are running zookeeper in a 
containerized AWS environment and we had to restart the problem container to 
get the above issue resolved. The issue comes when the port binding doesn't 
happen. When the container becomes unhealthy, it doesn't release the port and 
when it tries to bind to that port to join the quorum, as the port was already 
in use and never released, it throws the exception of *Unexpected exception 
causing shutdown while sock still open*

This is where the binding happens, QuorumCnxManager class in zookeeper,
*ss.socket().bind(new InetSocketAddress(port));*

In LearnerHandler.java class, it tries to access the port and as the port is 
still being used, it throws the exception**

*if (sock != null && !sock.isClosed()) {LOG.error("Unexpected exception causing 
shutdown while sock "+ "still open", e);*

Most of the cases, the port might not be null.


was (Author: [email protected]):
We got the same issue a couple of days back. We are running zookeeper in a 
containerized AWS environment and we had to restart the problem container to 
get the above issue resolved. The issue comes when the port binding doesn't 
happen. When the container becomes unhealthy, it doesn't release the port and 
when it tries to bind to that port to join the quorum, as the port was already 
in use and never released, it throws the exception of *Unexpected exception 
causing shutdown while sock still open*

> Unexpected exception in zookeeper
> ---------------------------------
>
>                 Key: ZOOKEEPER-3036
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3036
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum, server
>    Affects Versions: 3.4.10
>         Environment: 3 Zookeepers, 5 kafka servers
>            Reporter: Oded
>            Priority: Critical
>
> We got an issue with one of the zookeeprs (Leader), causing the entire kafka 
> cluster to fail:
> 2018-05-09 02:29:01,730 [myid:3] - ERROR 
> [LearnerHandler-/192.168.0.91:42490:LearnerHandler@648] - Unexpected 
> exception causing shutdown while sock still open
> java.net.SocketTimeoutException: Read timed out
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>         at java.net.SocketInputStream.read(SocketInputStream.java:171)
>         at java.net.SocketInputStream.read(SocketInputStream.java:141)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>         at java.io.DataInputStream.readInt(DataInputStream.java:387)
>         at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>         at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
>         at 
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559)
> 2018-05-09 02:29:01,730 [myid:3] - WARN  
> [LearnerHandler-/192.168.0.91:42490:LearnerHandler@661] - ******* GOODBYE 
> /192.168.0.91:42490 ********
>  
> We would expect that zookeeper will choose another Leader and the Kafka 
> cluster will continue to work as expected, but that was not the case.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to