[
https://issues.apache.org/jira/browse/ZOOKEEPER-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001034#comment-17001034
]
Fangmin Lv commented on ZOOKEEPER-3573:
---------------------------------------
Recently, we found without SO_LINGER it might cause the sessions expired
unexpected even doing async close.
Let's say there is network issue between leader and one follower, leader
detected read timed out from inbound, the outbound network seems slow but can
maintain the aliveness and follower won't detect read timed out, then leader
closed the TLS connection, but blocked on sending close_notify packet due to
send buffer is full, which may take more than 30s to close, and only then the
follower detected the network issue and started to shutdown and close all
client connections, but at that time those sessions are already timed out.
Jie is following up with OpenJDK community to see if we can support this option
in SSL socket in OpenJDK 11+, or if there is any alternative to solve this.
> Dealing with long TLS connection closing time without SO_LINGER option
> ----------------------------------------------------------------------
>
> Key: ZOOKEEPER-3573
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3573
> Project: ZooKeeper
> Issue Type: Improvement
> Components: server
> Affects Versions: 3.6.0
> Reporter: Jie Huang
> Priority: Major
>
> As described in ZOOKEEPER-3384, with SSL sockets, a close_notify is required
> to be sent before closing the write side of a connection. When the send
> buffer is full and the writing is blocked, it will take a long time to send
> close_notify thus a long time to close the socket. The long closing time on
> followers with a partitioned-away leader would stall the shutdown process and
> delay a new leader election to establish a new quorum. As a result, the
> ensemble would be unavailable for a long time.
> In ZOOKEEPER-3384, SO_LINGER option is used to close the socket quickly (and
> potentially uncleanly). In JDK 11, however, SO_LINGER option is not honored
> so we need a new way to avoid the long quorum unavailable time.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)