[ https://issues.apache.org/jira/browse/ZOOKEEPER-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andor Molnar resolved ZOOKEEPER-3240. ------------------------------------- Resolution: Fixed Fix Version/s: 3.5.5 3.6.0 Issue resolved by pull request 767 [https://github.com/apache/zookeeper/pull/767] > Close socket on Learner shutdown to avoid dangling socket > --------------------------------------------------------- > > Key: ZOOKEEPER-3240 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3240 > Project: ZooKeeper > Issue Type: Improvement > Components: server > Affects Versions: 3.6.0 > Reporter: Brian Nixon > Assignee: Brian Nixon > Priority: Minor > Labels: pull-request-available > Fix For: 3.6.0, 3.5.5 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > There was a Learner that had two connections to the Leader after that Learner > hit an unexpected exception during flush txn to disk, which will shutdown > previous follower instance and restart a new one. > > {quote}2018-10-26 02:31:35,568 ERROR > [SyncThread:3:ZooKeeperCriticalThread@48] - Severe unrecoverable error, from > thread : SyncThread:3 > java.io.IOException: Input/output error > at java.base/sun.nio.ch.FileDispatcherImpl.force0(Native Method) > at > java.base/sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:72) > at > java.base/sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:395) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:457) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:548) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:769) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:172) > 2018-10-26 02:31:35,568 INFO [SyncThread:3:ZooKeeperServerListenerImpl@42] - > Thread SyncThread:3 exits, error code 1 > 2018-10-26 02:31:35,568 INFO [SyncThread:3:SyncRequestProcessor@234] - > SyncRequestProcessor exited!{quote} > > It is supposed to close the previous socket, but it doesn't seem to be done > anywhere in the code. This leaves the socket open with no one reading from > it, and caused the queue full and blocked on sender. > > Since the LearnerHandler didn't shutdown gracefully, the learner queue size > keeps growing, the JVM heap size on leader keeps growing and added pressure > to the GC, and cause high GC time and latency in the quorum. > > The simple fix is to gracefully shutdown the socket. -- This message was sent by Atlassian JIRA (v7.6.3#76005)