Fangmin Lv created ZOOKEEPER-3384:
-------------------------------------

             Summary: Avoid long quorum unavailable time due to TLS connection 
close stalled with full send buffer
                 Key: ZOOKEEPER-3384
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3384
             Project: ZooKeeper
          Issue Type: Improvement
          Components: server
            Reporter: Fangmin Lv
            Assignee: Fangmin Lv
             Fix For: 3.6.0


 

*Problem*

For SSL socket, when calling close(), it is required to send a close_notify 
alert before closing the write side of the connection. In case the leader is 
partitioned away, it's possible that the learner shutdown may take long time if 
the send buffer is full, because it will block on sending close_notify packet.

>From the SSLSocketImpl implementation, it still honors the SO_LINGER socket 
>option, the difference is that even we set the SO_LINGER time to be 0 it will 
>still try to issue the close_notify packet. But it will fail immediately and 
>close the socket if it failed to acquire the write lock immediately.

Set SO_LINGER to a small number will avoid stall for a long time during 
shutdown, this is what we're going to do here.

*Any Cons of doing this?*

>From the TCP RFC, the close handshake is added to avoid a truncation attack 
>where an attacker inserts into a message a TCP code indicating the message has 
>finished, thus preventing the recipient picking up the rest of the message. 
>But it's fine if the peer didn't send close_notify in some cases, for example, 
>the client crashed or being killed, etc. For us, usually the close_notify 
>won't be and don't have chance to send during rolling restart.

Another thing mentioned in the RFC is that not able to send close_notify will 
cause the SSL session not able to be resume. Given reusable session id is not 
benefiting ZooKeeper quorum anyway, this is not a problem for us.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to