[ https://issues.apache.org/jira/browse/ZOOKEEPER-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ZOOKEEPER-3384: -------------------------------------- Labels: pull-request-available (was: ) > Avoid long quorum unavailable time due to TLS connection close stalled with > full send buffer > -------------------------------------------------------------------------------------------- > > Key: ZOOKEEPER-3384 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3384 > Project: ZooKeeper > Issue Type: Sub-task > Components: server > Reporter: Fangmin Lv > Assignee: Fangmin Lv > Priority: Major > Labels: pull-request-available > Fix For: 3.6.0 > > > > *Problem* > For SSL socket, when calling close(), it is required to send a close_notify > alert before closing the write side of the connection. In case the leader is > partitioned away, it's possible that the learner shutdown may take long time > if the send buffer is full, because it will block on sending close_notify > packet. > From the SSLSocketImpl implementation, it still honors the SO_LINGER socket > option, the difference is that even we set the SO_LINGER time to be 0 it will > still try to issue the close_notify packet. But it will fail immediately and > close the socket if it failed to acquire the write lock immediately. > Set SO_LINGER to a small number will avoid stall for a long time during > shutdown, this is what we're going to do here. > *Any Cons of doing this?* > From the TCP RFC, the close handshake is added to avoid a truncation attack > where an attacker inserts into a message a TCP code indicating the message > has finished, thus preventing the recipient picking up the rest of the > message. But it's fine if the peer didn't send close_notify in some cases, > for example, the client crashed or being killed, etc. For us, usually the > close_notify won't be and don't have chance to send during rolling restart. > Another thing mentioned in the RFC is that not able to send close_notify will > cause the SSL session not able to be resume. Given reusable session id is not > benefiting ZooKeeper quorum anyway, this is not a problem for us. -- This message was sent by Atlassian Jira (v8.3.2#803003)