[
https://issues.apache.org/jira/browse/ZOOKEEPER-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated ZOOKEEPER-3384:
--------------------------------------
Labels: pull-request-available (was: )
> Avoid long quorum unavailable time due to TLS connection close stalled with
> full send buffer
> --------------------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-3384
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3384
> Project: ZooKeeper
> Issue Type: Sub-task
> Components: server
> Reporter: Fangmin Lv
> Assignee: Fangmin Lv
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.6.0
>
>
>
> *Problem*
> For SSL socket, when calling close(), it is required to send a close_notify
> alert before closing the write side of the connection. In case the leader is
> partitioned away, it's possible that the learner shutdown may take long time
> if the send buffer is full, because it will block on sending close_notify
> packet.
> From the SSLSocketImpl implementation, it still honors the SO_LINGER socket
> option, the difference is that even we set the SO_LINGER time to be 0 it will
> still try to issue the close_notify packet. But it will fail immediately and
> close the socket if it failed to acquire the write lock immediately.
> Set SO_LINGER to a small number will avoid stall for a long time during
> shutdown, this is what we're going to do here.
> *Any Cons of doing this?*
> From the TCP RFC, the close handshake is added to avoid a truncation attack
> where an attacker inserts into a message a TCP code indicating the message
> has finished, thus preventing the recipient picking up the rest of the
> message. But it's fine if the peer didn't send close_notify in some cases,
> for example, the client crashed or being killed, etc. For us, usually the
> close_notify won't be and don't have chance to send during rolling restart.
> Another thing mentioned in the RFC is that not able to send close_notify will
> cause the SSL session not able to be resume. Given reusable session id is not
> benefiting ZooKeeper quorum anyway, this is not a problem for us.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)