[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16741714#comment-16741714
 ] 

Michael Han commented on ZOOKEEPER-3240:
----------------------------------------

[~nixon] :
bq.  so the Leader is unable to sense the change in Learner status through the 
status of the network connection

A plausible theory :)

The ping packet between leader and learners is designed to solve this exact 
problem - to be able to detect liveness of the other side. Basically for each 
learner, leader will constantly read packets out of the socket associated with 
the learner in the corresponding LearnerHandler thread. And this read has a 
timeout configured on the socket on leader side, so even if the sockets on both 
side are valid, but there is no traffic (such as in this case, where learner 
leaks sockets by not properly closing them after shutting down), leader's read 
should eventually time out after sync limit check. Unless:

* Leader's socket read time out has no effect. So leader will block on reading 
a socket indefinitely because there is no traffic from learner.
* Learner process, after restarted, somehow ended up with reusing the old 
Learner socket that's leaked so the corresponding LearnerHandler thread can't 
detect any difference (which is expected.). I am not sure how possible this 
case is in practice.

In any case, it seems that our Ping mechanism failed to detect the network 
change in this case.

bq. the learner queue size keeps growing

Do you mind elaborate a little bit on which exact queue this is and what caused 
it growing?





> Close socket on Learner shutdown to avoid dangling socket
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-3240
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3240
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 3.6.0
>            Reporter: Brian Nixon
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> There was a Learner that had two connections to the Leader after that Learner 
> hit an unexpected exception during flush txn to disk, which will shutdown 
> previous follower instance and restart a new one.
>  
> {quote}2018-10-26 02:31:35,568 ERROR 
> [SyncThread:3:ZooKeeperCriticalThread@48] - Severe unrecoverable error, from 
> thread : SyncThread:3
> java.io.IOException: Input/output error
>         at java.base/sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>         at 
> java.base/sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:72)
>         at 
> java.base/sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:395)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:457)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:548)
>         at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:769)
>         at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246)
>         at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:172)
> 2018-10-26 02:31:35,568 INFO  [SyncThread:3:ZooKeeperServerListenerImpl@42] - 
> Thread SyncThread:3 exits, error code 1
> 2018-10-26 02:31:35,568 INFO [SyncThread:3:SyncRequestProcessor@234] - 
> SyncRequestProcessor exited!{quote}
>  
> It is supposed to close the previous socket, but it doesn't seem to be done 
> anywhere in the code. This leaves the socket open with no one reading from 
> it, and caused the queue full and blocked on sender.
>  
> Since the LearnerHandler didn't shutdown gracefully, the learner queue size 
> keeps growing, the JVM heap size on leader keeps growing and added pressure 
> to the GC, and cause high GC time and latency in the quorum.
>  
> The simple fix is to gracefully shutdown the socket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to