[
https://issues.apache.org/jira/browse/ZOOKEEPER-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andor Molnar reopened ZOOKEEPER-3240:
-------------------------------------
Reopened the issue, because I had to revert the commit due to failing unit
tests.
[~nixon] Please fix them in order to merge this patch.
> Close socket on Learner shutdown to avoid dangling socket
> ---------------------------------------------------------
>
> Key: ZOOKEEPER-3240
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3240
> Project: ZooKeeper
> Issue Type: Improvement
> Components: server
> Affects Versions: 3.6.0
> Reporter: Brian Nixon
> Assignee: Brian Nixon
> Priority: Minor
> Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> There was a Learner that had two connections to the Leader after that Learner
> hit an unexpected exception during flush txn to disk, which will shutdown
> previous follower instance and restart a new one.
>
> {quote}2018-10-26 02:31:35,568 ERROR
> [SyncThread:3:ZooKeeperCriticalThread@48] - Severe unrecoverable error, from
> thread : SyncThread:3
> java.io.IOException: Input/output error
> at java.base/sun.nio.ch.FileDispatcherImpl.force0(Native Method)
> at
> java.base/sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:72)
> at
> java.base/sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:395)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:457)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:548)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:769)
> at
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246)
> at
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:172)
> 2018-10-26 02:31:35,568 INFO [SyncThread:3:ZooKeeperServerListenerImpl@42] -
> Thread SyncThread:3 exits, error code 1
> 2018-10-26 02:31:35,568 INFO [SyncThread:3:SyncRequestProcessor@234] -
> SyncRequestProcessor exited!{quote}
>
> It is supposed to close the previous socket, but it doesn't seem to be done
> anywhere in the code. This leaves the socket open with no one reading from
> it, and caused the queue full and blocked on sender.
>
> Since the LearnerHandler didn't shutdown gracefully, the learner queue size
> keeps growing, the JVM heap size on leader keeps growing and added pressure
> to the GC, and cause high GC time and latency in the quorum.
>
> The simple fix is to gracefully shutdown the socket.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)