[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097903#comment-13097903
 ] 

Thomas Koch commented on ZOOKEEPER-515:
---------------------------------------

I propose to close this issue. It has not been touched for two years, no 
response from the original reporter, old ZK version (3.2).

> Zookeeper quorum didn't provide service when restart after an "Out of memory" 
> crash
> -----------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-515
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-515
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.2.0
>         Environment: Linux 2.6.9-52bs-4core #2 SMP Wed Jan 16 14:44:08 EST 
> 2008 x86_64 x86_64 x86_64 GNU/Linux
> Jdk: 1.6.0_14 
>            Reporter: Qian Ye
>             Fix For: 3.5.0
>
>
> The Zookeeper quorum, containing 5 servers, didn't provide service when 
> restart after an "Out of memory" crash. 
> It happened as following:
> 1. we built  a Zookeeper quorum which contained  5 servers, say 1, 3, 4, 5, 6 
> (have no 2), and 6 was the leader.
> 2. we created 18 threads on 6 different servers to set and get data from a 
> znode in the Zookeeper at the same time.  The size of the data is 1MB. The 
> test threads did their job as fast as possible, no pause between two 
> operation, and they repeated the setting and getting 4000 times. 
> 3. the Zookeeper leader crashed about 10 mins  after the test threads 
> started. The leader printed out the log:
> 2009-08-25 12:00:12,301 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x523
> 4223c2dc00b5 due to java.io.IOException: Read error
> 2009-08-25 12:00:12,318 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x523
> 4223c2dc00b6 due to java.io.IOException: Read error
> 2009-08-25 12:03:44,086 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x523
> 4223c2dc00b8 due to java.io.IOException: Read error
> 2009-08-25 12:04:53,757 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x523
> 4223c2dc00b7 due to java.io.IOException: Read error
> 2009-08-25 12:15:45,151 - FATAL [SyncThread:0:SyncRequestProcessor@131] - 
> Severe unrecoverable error, exiting
> java.lang.OutOfMemoryError: Java heap space
>     at java.util.Arrays.copyOf(Arrays.java:2786)
>     at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:71)
>     at java.io.DataOutputStream.writeInt(DataOutputStream.java:180)
>     at 
> org.apache.jute.BinaryOutputArchive.writeInt(BinaryOutputArchive.java:55)
>     at org.apache.zookeeper.txn.SetDataTxn.serialize(SetDataTxn.java:42)
>     at 
> org.apache.zookeeper.server.persistence.Util.marshallTxnEntry(Util.java:262)
>     at 
> org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:154)
>     at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:268)
>     at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:100)
> It is clear that the leader ran out of memory. then the server 4 was down 
> almost at the same time, and printed out the log:
> 2009-08-25 12:15:45,995 - ERROR 
> [FollowerRequestProcessor:3:FollowerRequestProcessor@91] - Unexpected 
> exception causing
> exit
> java.net.SocketException: Connection reset
>     at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
>     at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>     at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>     at java.io.DataOutputStream.write(DataOutputStream.java:90)
>     at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
>     at 
> org.apache.jute.BinaryOutputArchive.writeBuffer(BinaryOutputArchive.java:119)
>     at 
> org.apache.zookeeper.server.quorum.QuorumPacket.serialize(QuorumPacket.java:51)
>     at 
> org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123)
>     at 
> org.apache.zookeeper.server.quorum.Follower.writePacket(Follower.java:97)
>     at org.apache.zookeeper.server.quorum.Follower.request(Follower.java:399)
>     at 
> org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:86)
> 2009-08-25 12:15:45,996 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x423
> 4ab894330075 due to java.net.SocketException: Broken pipe
> 2009-08-25 12:15:45,996 - FATAL [SyncThread:3:SyncRequestProcessor@131] - 
> Severe unrecoverable error, exiting
> java.net.SocketException: Broken pipe
>     at java.net.SocketOutputStream.socketWrite0(Native Method)
>     at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>     at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>     at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>     at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
>     at 
> org.apache.zookeeper.server.quorum.Follower.writePacket(Follower.java:100)
>     at 
> org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:52)
>     at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:147)
>     at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:92)
> 2009-08-25 12:15:45,995 - WARN  [QuorumPeer:/0.0.0.0:2181:Follower@309] - 
> Exception when following the leader
> java.net.SocketException: Broken pipe
>     at java.net.SocketOutputStream.socketWrite0(Native Method)
>     at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>     at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>     at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>     at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
>     at 
> org.apache.zookeeper.server.quorum.Follower.writePacket(Follower.java:100)
>     at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:256)
>     at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:498)
> 2009-08-25 12:15:46,022 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x0 d
> ue to java.io.IOException: ZooKeeperServer not running
> 2009-08-25 12:15:46,022 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x0 d
> ue to java.io.IOException: ZooKeeperServer not running
> 2009-08-25 12:15:46,023 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x0 d
> ue to java.io.IOException: ZooKeeperServer not running
> It was really strange that when the 2 server down, the other three servers 
> cannot provide service any more, the 'stat' command all leaded to 
> "ZooKeeperServer not running".
> 4. I restarted the server 6(the former leader) and the server 4. But the 
> service didn't come back. All the five servers printed "ZookeeperServer not 
> running". The server 6 printed the logs:
> 2009-08-25 14:02:15,395 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x0 d
> ue to java.io.IOException: ZooKeeperServer not running 
> 2009-08-25 14:02:27,703 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x0 d
> ue to java.io.IOException: Responded to info probe
> 2009-08-25 14:02:28,733 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x0 d
> ue to java.io.IOException: ZooKeeperServer not running 
> 2009-08-25 14:02:42,070 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x0 d
> ue to java.io.IOException: ZooKeeperServer not running 
> 2009-08-25 14:02:55,407 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x0 d
> ue to java.io.IOException: ZooKeeperServer not running 
> 2009-08-25 14:03:08,744 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x0 d
> ue to java.io.IOException: ZooKeeperServer not running 
> 2009-08-25 14:03:22,080 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x0 d
> ue to java.io.IOException: ZooKeeperServer not running 
> 2009-08-25 14:03:29,396 - ERROR [main:Util@238] - Last transaction was 
> partial.
> 2009-08-25 14:03:35,417 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x0 d
> ue to java.io.IOException: ZooKeeperServer not running 
> 2009-08-25 14:03:48,761 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x0 d
> ue to java.io.IOException: ZooKeeperServer not running 
> The server 4 printed logs like:
> 2009-08-25 14:03:48,747 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x0 d
> ue to java.io.IOException: ZooKeeperServer not running
> 2009-08-25 14:04:02,091 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x0 d
> ue to java.io.IOException: ZooKeeperServer not running
> 2009-08-25 14:04:15,427 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x0 d
> ue to java.io.IOException: ZooKeeperServer not running
> 2009-08-25 14:04:17,816 - WARN  [QuorumPeer:/0.0.0.0:2181:Follower@164] - 
> Unexpected exception, tries=0
> java.net.ConnectException: Connection refused
>     at java.net.PlainSocketImpl.socketConnect(Native Method)
>     at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>     at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>     at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>     at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>     at java.net.Socket.connect(Socket.java:525)
>     at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:156)
>     at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:498)
> 2009-08-25 14:04:18,820 - WARN  [QuorumPeer:/0.0.0.0:2181:Follower@164] - 
> Unexpected exception, tries=1
> java.net.ConnectException: Connection refused
>     at java.net.PlainSocketImpl.socketConnect(Native Method)
>     at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>     at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>     at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>     at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>     at java.net.Socket.connect(Socket.java:525)
>     at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:156)
>     at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:498)
> 2009-08-25 14:04:19,823 - WARN  [QuorumPeer:/0.0.0.0:2181:Follower@164] - 
> Unexpected exception, tries=2
> java.net.ConnectException: Connection refused
>     at java.net.PlainSocketImpl.socketConnect(Native Method)
>     at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>     at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>     at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>     at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>     at java.net.Socket.connect(Socket.java:525)
>     at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:156)
>     at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:498)
> 2009-08-25 14:04:28,764 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x0 d
> ue to java.io.IOException: ZooKeeperServer not running
> 2009-08-25 14:04:42,101 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x0 d
> ue to java.io.IOException: ZooKeeperServer not running
> the server 1, 3, 5 printed out the logs like:
> 2009-08-25 14:01:35,396 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x0 d
> ue to java.io.IOException: ZooKeeperServer not running
> 2009-08-25 14:01:36,554 - WARN  [QuorumPeer:/0.0.0.0:2181:LeaderElection@194] 
> - Ignoring exception while looking for lea
> der
> java.net.SocketTimeoutException: Receive timed out 
>     at java.net.PlainDatagramSocketImpl.receive0(Native Method)
>     at 
> java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136)
>     at java.net.DatagramSocket.receive(DatagramSocket.java:712)
>     at 
> org.apache.zookeeper.server.quorum.LeaderElection.lookForLeader(LeaderElection.java:170)
>     at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:488)
> 2009-08-25 14:01:37,758 - WARN  [QuorumPeer:/0.0.0.0:2181:LeaderElection@194] 
> - Ignoring exception while looking for lea
> der
> java.net.SocketTimeoutException: Receive timed out 
>     at java.net.PlainDatagramSocketImpl.receive0(Native Method)
>     at 
> java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136)
>     at java.net.DatagramSocket.receive(DatagramSocket.java:712)
>     at 
> org.apache.zookeeper.server.quorum.LeaderElection.lookForLeader(LeaderElection.java:170)
>     at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:488)
> 2009-08-25 14:01:37,865 - WARN  [QuorumPeer:/0.0.0.0:2181:Follower@164] - 
> Unexpected exception, tries=0
> java.net.ConnectException: Connection refused
>     at java.net.PlainSocketImpl.socketConnect(Native Method)
>     at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>     at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>     at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>     at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>     at java.net.Socket.connect(Socket.java:525)
>     at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:156)
>     at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:498)
> 2009-08-25 14:01:38,289 - WARN  [NIOServerCxn.Factory:2181:NIOServerCnxn@497] 
> - Exception causing close of session 0x0 d
> ue to java.io.IOException: Responded to info probe
> my zoo.cfg is like:
> tickTime=2000
> dataDir=./status/
> clientPort=2181
> initLimit=10
> syncLimit=2
> server.1=10.81.11.107:2888:3888
> server.2=10.81.11.106:2888:3888
> server.3=10.81.11.89:2888:3888
> server.4=10.81.11.99:2888:3888
> server.5=10.81.11.79:2888:3888
> Several questions:
> 1. Why the leader selection failed after the restart?
> 2. Is the size of data too big to be processed properly?
> 3. How can I recover from this situation? Can I just remove the version-2 
> directory on server 6(the former leader) and restart the server?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to