[zookeeper] branch master updated: ZOOKEEPER-3574: Close quorum socket asynchronously to avoid server shutdown stalled by long socket closing time

eolivelli Sun, 03 May 2020 10:44:10 -0700

This is an automated email from the ASF dual-hosted git repository.

eolivelli pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/zookeeper.git



The following commit(s) were added to refs/heads/master by this push:
     new b4f6e82  ZOOKEEPER-3574: Close quorum socket asynchronously to avoid 
server shutdown stalled by long socket closing time
b4f6e82 is described below

commit b4f6e82de10e6f87836f8221970110dad2b3825c
Author: Jie Huang <jiehu...@fb.com>
AuthorDate: Sun May 3 19:43:27 2020 +0200

    ZOOKEEPER-3574: Close quorum socket asynchronously to avoid server shutdown 
stalled by long socket closing time
    
    …utdown stalled by long socket closing time
    
    Author: Jie Huang <jiehu...@fb.com>
    
    Reviewers: Enrico Olivelli <eolive...@aoache.org>, Fangmin Lyu 
<fang...@apache.org>
    
    Closes #1115 from jhuan31/ZOOKEEPER-3574
---
 .../src/main/resources/markdown/zookeeperAdmin.md  | 173 +++++++++++----------
 .../org/apache/zookeeper/server/ServerMetrics.java |   5 +
 .../apache/zookeeper/server/quorum/Learner.java    |  26 ++++
 3 files changed, 120 insertions(+), 84 deletions(-)

diff --git a/zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md 
b/zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md
index ca1a3f7..6c616d1 100644
--- a/zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md
+++ b/zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md
@@ -202,11 +202,11 @@ ensemble:
   though about a few here:
   Every machine that is part of the ZooKeeper ensemble should know
   about every other machine in the ensemble. You accomplish this with
-  the series of lines of the form **server.id=host:port:port**. 
-  (The parameters **host** and **port** are straightforward, for each server 
+  the series of lines of the form **server.id=host:port:port**.
+  (The parameters **host** and **port** are straightforward, for each server
   you need to specify first a Quorum port then a dedicated port for ZooKeeper 
leader
-  election). Since ZooKeeper 3.6.0 you can also [specify multiple 
addresses](#id_multi_address) 
-  for each ZooKeeper server instance (this can increase availability when 
multiple physical 
+  election). Since ZooKeeper 3.6.0 you can also [specify multiple 
addresses](#id_multi_address)
+  for each ZooKeeper server instance (this can increase availability when 
multiple physical
   network interfaces can be used parallel in the cluster).
   You attribute the
   server id to each machine by creating a file named
@@ -233,7 +233,7 @@ ensemble:
   ensemble.
 
 7. If your configuration file is set up, you can start a
-  ZooKeeper server:  
+  ZooKeeper server:
 
         $ java -cp zookeeper.jar:lib/*:conf 
org.apache.zookeeper.server.quorum.QuorumPeerMain zoo.conf
 
@@ -1004,7 +1004,7 @@ property, when available, is noted below.
 * *zookeeper.request_throttler.shutdownTimeout* :
     (Java system property only)
     **New in 3.6.0:**
-    The time (in milliseconds) the RequestThrottler waits for the request 
queue to drain during shutdown before it shuts down forcefully. The default is 
10000.  
+    The time (in milliseconds) the RequestThrottler waits for the request 
queue to drain during shutdown before it shuts down forcefully. The default is 
10000.
 
 * *advancedFlowControlEnabled* :
     (Java system property: **zookeeper.netty.advancedFlowControl.enabled**)
@@ -1032,59 +1032,59 @@ property, when available, is noted below.
     **New in 3.6.0:**
     The digest feature is added to detect the data inconsistency inside
     ZooKeeper when loading database from disk, catching up and following
-    leader, its doing incrementally hash check for the DataTree based on 
+    leader, its doing incrementally hash check for the DataTree based on
     the adHash paper mentioned in
 
         https://cseweb.ucsd.edu/~daniele/papers/IncHash.pdf
 
-    The idea is simple, the hash value of DataTree will be updated 
incrementally 
-    based on the changes to the set of data. When the leader is preparing the 
txn, 
-    it will pre-calculate the hash of the tree based on the changes happened 
with 
+    The idea is simple, the hash value of DataTree will be updated 
incrementally
+    based on the changes to the set of data. When the leader is preparing the 
txn,
+    it will pre-calculate the hash of the tree based on the changes happened 
with
     formula:
 
         current_hash = current_hash + hash(new node data) - hash(old node data)
 
-    If it’s creating a new node, the hash(old node data) will be 0, and if 
it’s a 
+    If it’s creating a new node, the hash(old node data) will be 0, and if 
it’s a
     delete node op, the hash(new node data) will be 0.
 
-    This hash will be associated with each txn to represent the expected hash 
value 
-    after applying the txn to the data tree, it will be sent to followers with 
-    original proposals. Learner will compare the actual hash value with the 
one in 
-    the txn after applying the txn to the data tree, and report mismatch if 
it’s not 
+    This hash will be associated with each txn to represent the expected hash 
value
+    after applying the txn to the data tree, it will be sent to followers with
+    original proposals. Learner will compare the actual hash value with the 
one in
+    the txn after applying the txn to the data tree, and report mismatch if 
it’s not
     the same.
 
-    These digest value will also be persisted with each txn and snapshot on 
the disk, 
-    so when servers restarted and load data from disk, it will compare and see 
if 
+    These digest value will also be persisted with each txn and snapshot on 
the disk,
+    so when servers restarted and load data from disk, it will compare and see 
if
     there is hash mismatch, which will help detect data loss issue on disk.
 
-    For the actual hash function, we’re using CRC internally, it’s not a 
collisionless 
-    hash function, but it’s more efficient compared to collisionless hash, and 
the 
+    For the actual hash function, we’re using CRC internally, it’s not a 
collisionless
+    hash function, but it’s more efficient compared to collisionless hash, and 
the
     collision possibility is really really rare and can already meet our needs 
here.
 
-    This feature is backward and forward compatible, so it can safely rolling 
upgrade, 
-    downgrade, enabled and later disabled without any compatible issue. Here 
are the 
+    This feature is backward and forward compatible, so it can safely rolling 
upgrade,
+    downgrade, enabled and later disabled without any compatible issue. Here 
are the
     scenarios have been covered and tested:
 
-    1. When leader runs with new code while follower runs with old one, the 
digest will 
-       be append to the end of each txn, follower will only read header and 
txn data, 
-       digest value in the txn will be ignored. It won't affect the follower 
reads and 
+    1. When leader runs with new code while follower runs with old one, the 
digest will
+       be append to the end of each txn, follower will only read header and 
txn data,
+       digest value in the txn will be ignored. It won't affect the follower 
reads and
        processes the next txn.
     2. When leader runs with old code while follower runs with new one, the 
digest won't
-       be sent with txn, when follower tries to read the digest, it will throw 
EOF which 
+       be sent with txn, when follower tries to read the digest, it will throw 
EOF which
        is caught and handled gracefully with digest value set to null.
     3. When loading old snapshot with new code, it will throw IOException when 
trying to
        read the non-exist digest value, and the exception will be caught and 
digest will
-       be set to null, which means we won't compare digest when loading this 
snapshot, 
+       be set to null, which means we won't compare digest when loading this 
snapshot,
        which is expected to happen during rolling upgrade
-    4. When loading new snapshot with old code, it will finish successfully 
after deserialzing 
+    4. When loading new snapshot with old code, it will finish successfully 
after deserialzing
        the data tree, the digest value at the end of snapshot file will be 
ignored
-    5. The scenarios of rolling restart with flags change are similar to the 
1st and 2nd 
+    5. The scenarios of rolling restart with flags change are similar to the 
1st and 2nd
        scenarios discussed above, if the leader enabled but follower not, 
digest value will
        be ignored, and follower won't compare the digest during runtime; if 
leader disabled
        but follower enabled, follower will get EOF exception which is handled 
gracefully.
 
-    Note: the current digest calculation excluded nodes under /zookeeper 
-    due to the potential inconsistency in the /zookeeper/quota stat node, 
+    Note: the current digest calculation excluded nodes under /zookeeper
+    due to the potential inconsistency in the /zookeeper/quota stat node,
     we can include that after that issue is fixed.
 
     By default, this feautre is disabled, set "true" to enable it.
@@ -1107,7 +1107,7 @@ property, when available, is noted below.
     **New in 3.6.0:**
     By default audit logs are disabled. Set to "true" to enable it. Default 
value is "false".
     See the [ZooKeeper audit logs](zookeeperAuditLogs.html) for more 
information.
-    
+
 * *audit.impl.class* :
     (Java system property: **zookeeper.audit.impl.class**)
     **New in 3.6.0:**
@@ -1125,12 +1125,12 @@ property, when available, is noted below.
     **New in 3.6.0:**
     The size threshold after which a request is considered a large request. If 
it is -1, then all requests are considered small, effectively turning off large 
request throttling. The default is -1.
 
-* *outstandingHandshake.limit* 
+* *outstandingHandshake.limit*
     (Jave system property only: 
**zookeeper.netty.server.outstandingHandshake.limit**)
-    The maximum in-flight TLS handshake connections could have in ZooKeeper, 
-    the connections exceed this limit will be rejected before starting 
handshake. 
-    This setting doesn't limit the max TLS concurrency, but helps avoid herd 
-    effect due to TLS handshake timeout when there are too many in-flight TLS 
+    The maximum in-flight TLS handshake connections could have in ZooKeeper,
+    the connections exceed this limit will be rejected before starting 
handshake.
+    This setting doesn't limit the max TLS concurrency, but helps avoid herd
+    effect due to TLS handshake timeout when there are too many in-flight TLS
     handshakes. Set it to something like 250 is good enough to avoid herd 
effect.
 
 * *throttledOpWaitTime*
@@ -1142,6 +1142,11 @@ property, when available, is noted below.
     The intent is for the clients not to retry them immediately.
     When set to 0, no requests will be throttled. The default is 0.
 
+* *learner.closeSocketAsync*
+    (Jave system property only: **learner.closeSocketAsync**)
+    When enabled, a learner will close the quorum socket asynchronously. This 
is useful for TLS connections where closing a socket might take a long time, 
block the shutdown process, potentially delay a new leader election, and leave 
the quorum unavailabe. Closing the socket asynchronously avoids blocking the 
shutdown process despite the long socket closing time and a new leader election 
can be started while the socket being closed. The default is false.
+
+
 <a name="sc_clusterOptions"></a>
 
 #### Cluster Options
@@ -1155,13 +1160,13 @@ of servers -- that is, when deploying clusters of 
servers.
     non-authenticated UDP-based version of fast leader election, "2"
     corresponds to the authenticated UDP-based version of fast
     leader election, and "3" corresponds to TCP-based version of
-    fast leader election. Algorithm 3 was made default in 3.2.0 and 
+    fast leader election. Algorithm 3 was made default in 3.2.0 and
     prior versions (3.0.0 and 3.1.0) were using algorithm 1 and 2 as well.
     ###### Note
-    >The implementations of leader election 1, and 2 were 
-    **deprecated** in 3.4.0. Since 3.6.0 only FastLeaderElection is available, 
-    in case of upgrade you have to shutdown all of your servers and 
-    restart them with electionAlg=3 (or by removing the line from the 
configuration file).                                                            
                                                                                
                                                                                
                                                                                
                                                                                
                       [...]
+    >The implementations of leader election 1, and 2 were
+    **deprecated** in 3.4.0. Since 3.6.0 only FastLeaderElection is available,
+    in case of upgrade you have to shutdown all of your servers and
+    restart them with electionAlg=3 (or by removing the line from the 
configuration file).                                                            
                                                                                
                                                                                
                                                                                
                                                                                
                       [...]
 
 * *maxTimeToWaitForEpoch* :
   (Java system property: **zookeeper.leader.maxTimeToWaitForEpoch**)
@@ -1215,15 +1220,15 @@ of servers -- that is, when deploying clusters of 
servers.
     The first followers use to connect to the leader, and the second is for
     leader election. If you want to test multiple servers on a single machine, 
then
     different ports can be used for each server.
-    
+
 
     <a name="id_multi_address"></a>
     Since ZooKeeper 3.6.0 it is possible to specify **multiple addresses** for 
each
     ZooKeeper server (see 
[ZOOKEEPER-3188](https://issues.apache.org/jira/projects/ZOOKEEPER/issues/ZOOKEEPER-3188)).
     To enable this feature, you must set the *multiAddress.enabled* 
configuration property
-    to *true*. This helps to increase availability and adds network level 
-    resiliency to ZooKeeper. When multiple physical network interfaces are 
used 
-    for the servers, ZooKeeper is able to bind on all interfaces and runtime 
switching 
+    to *true*. This helps to increase availability and adds network level
+    resiliency to ZooKeeper. When multiple physical network interfaces are used
+    for the servers, ZooKeeper is able to bind on all interfaces and runtime 
switching
     to a working interface in case a network error. The different addresses 
can be specified
     in the config using a pipe ('|') character. A valid configuration using 
multiple addresses looks like:
 
@@ -1375,16 +1380,16 @@ As an example, this will enable all four letter word 
commands:
 
 * *electionPortBindRetry* :
     (Java system property only: **zookeeper.electionPortBindRetry**)
-    Property set max retry count when Zookeeper server fails to bind 
-    leader election port. Such errors can be temporary and recoverable, 
+    Property set max retry count when Zookeeper server fails to bind
+    leader election port. Such errors can be temporary and recoverable,
     such as DNS issue described in 
[ZOOKEEPER-3320](https://issues.apache.org/jira/projects/ZOOKEEPER/issues/ZOOKEEPER-3320),
-    or non-retryable, such as port already in use.  
-    In case of transient errors, this property can improve availability 
-    of Zookeeper server and help it to self recover. 
-    Default value 3. In container environment, especially in Kubernetes, 
-    this value should be increased or set to 0(infinite retry) to overcome 
issues 
+    or non-retryable, such as port already in use.
+    In case of transient errors, this property can improve availability
+    of Zookeeper server and help it to self recover.
+    Default value 3. In container environment, especially in Kubernetes,
+    this value should be increased or set to 0(infinite retry) to overcome 
issues
     related to DNS name resolving.
-    
+
 
 * *observer.reconnectDelayMs* :
     (Java system property: **zookeeper.observer.reconnectDelayMs**)
@@ -1414,9 +1419,9 @@ As an example, this will enable all four letter word 
commands:
 The options in this section allow control over
 encryption/authentication/authorization performed by the service.
 
-Beside this page, you can also find useful information about client side 
configuration in the 
-[Programmers Guide](zookeeperProgrammers.html#sc_java_client_configuration). 
-The ZooKeeper Wiki also has useful pages about [ZooKeeper SSL 
support](https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide),
 
+Beside this page, you can also find useful information about client side 
configuration in the
+[Programmers Guide](zookeeperProgrammers.html#sc_java_client_configuration).
+The ZooKeeper Wiki also has useful pages about [ZooKeeper SSL 
support](https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide),
 and [SASL authentication for 
ZooKeeper](https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+and+SASL).
 
 * *DigestAuthenticationProvider.superDigest* :
@@ -1509,8 +1514,8 @@ and [SASL authentication for 
ZooKeeper](https://cwiki.apache.org/confluence/disp
 * *ssl.keyStore.type* and *ssl.quorum.keyStore.type* :
     (Java system properties: **zookeeper.ssl.keyStore.type** and 
**zookeeper.ssl.quorum.keyStore.type**)
     **New in 3.5.5:**
-    Specifies the file format of client and quorum keystores. Values: JKS, 
PEM, PKCS12 or null (detect by filename).    
-    Default: null     
+    Specifies the file format of client and quorum keystores. Values: JKS, 
PEM, PKCS12 or null (detect by filename).
+    Default: null
 
 * *ssl.trustStore.location* and *ssl.trustStore.password* and 
*ssl.quorum.trustStore.location* and *ssl.quorum.trustStore.password* :
     (Java system properties: **zookeeper.ssl.trustStore.location** and 
**zookeeper.ssl.trustStore.password** and 
**zookeeper.ssl.quorum.trustStore.location** and 
**zookeeper.ssl.quorum.trustStore.password**)
@@ -1522,8 +1527,8 @@ and [SASL authentication for 
ZooKeeper](https://cwiki.apache.org/confluence/disp
 * *ssl.trustStore.type* and *ssl.quorum.trustStore.type* :
     (Java system properties: **zookeeper.ssl.trustStore.type** and 
**zookeeper.ssl.quorum.trustStore.type**)
     **New in 3.5.5:**
-    Specifies the file format of client and quorum trustStores. Values: JKS, 
PEM, PKCS12 or null (detect by filename).    
-    Default: null     
+    Specifies the file format of client and quorum trustStores. Values: JKS, 
PEM, PKCS12 or null (detect by filename).
+    Default: null
 
 * *ssl.protocol* and *ssl.quorum.protocol* :
     (Java system properties: **zookeeper.ssl.protocol** and 
**zookeeper.ssl.quorum.protocol**)
@@ -1541,7 +1546,7 @@ and [SASL authentication for 
ZooKeeper](https://cwiki.apache.org/confluence/disp
     (Java system properties: **zookeeper.ssl.ciphersuites** and 
**zookeeper.ssl.quorum.ciphersuites**)
     **New in 3.5.5:**
     Specifies the enabled cipher suites to be used in client and quorum TLS 
negotiation.
-    Default: Enabled cipher suites depend on the Java runtime version being 
used.    
+    Default: Enabled cipher suites depend on the Java runtime version being 
used.
 
 * *ssl.context.supplier.class* and *ssl.quorum.context.supplier.class* :
     (Java system properties: **zookeeper.ssl.context.supplier.class** and 
**zookeeper.ssl.quorum.context.supplier.class**)
@@ -1586,19 +1591,19 @@ and [SASL authentication for 
ZooKeeper](https://cwiki.apache.org/confluence/disp
     Specifies that the client port should accept SSL connections
     (using the same configuration as the secure client port).
     Default: false
-    
+
 * *authProvider*:
     (Java system property: **zookeeper.authProvider**)
     You can specify multiple authentication provider classes for ZooKeeper.
     Usually you use this parameter to specify the SASL authentication provider
     like: 
`authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider`
-    
+
 * *kerberos.removeHostFromPrincipal*
     (Java system property: **zookeeper.kerberos.removeHostFromPrincipal**)
     You can instruct ZooKeeper to remove the host from the client principal 
name during authentication.
     (e.g. the zk/myh...@example.com client principal will be authenticated in 
ZooKeeper as z...@example.com)
     Default: false
-    
+
 * *kerberos.removeRealmFromPrincipal*
     (Java system property: **zookeeper.kerberos.removeRealmFromPrincipal**)
     You can instruct ZooKeeper to remove the realm from the client principal 
name during authentication.
@@ -1608,9 +1613,9 @@ and [SASL authentication for 
ZooKeeper](https://cwiki.apache.org/confluence/disp
 * *multiAddress.enabled* :
     (Java system property: **zookeeper.multiAddress.enabled**)
     **New in 3.6.0:**
-    Since ZooKeeper 3.6.0 you can also [specify multiple 
addresses](#id_multi_address) 
-    for each ZooKeeper server instance (this can increase availability when 
multiple physical 
-    network interfaces can be used parallel in the cluster). Setting this 
parameter to 
+    Since ZooKeeper 3.6.0 you can also [specify multiple 
addresses](#id_multi_address)
+    for each ZooKeeper server instance (this can increase availability when 
multiple physical
+    network interfaces can be used parallel in the cluster). Setting this 
parameter to
     **true** will enable this feature. Please note, that you can not enable 
this feature
     during a rolling upgrade if the version of the old ZooKeeper cluster is 
prior to 3.6.0.
     The default value is **false**.
@@ -1618,17 +1623,17 @@ and [SASL authentication for 
ZooKeeper](https://cwiki.apache.org/confluence/disp
 * *multiAddress.reachabilityCheckTimeoutMs* :
     (Java system property: 
**zookeeper.multiAddress.reachabilityCheckTimeoutMs**)
     **New in 3.6.0:**
-    Since ZooKeeper 3.6.0 you can also [specify multiple 
addresses](#id_multi_address) 
-    for each ZooKeeper server instance (this can increase availability when 
multiple physical 
+    Since ZooKeeper 3.6.0 you can also [specify multiple 
addresses](#id_multi_address)
+    for each ZooKeeper server instance (this can increase availability when 
multiple physical
     network interfaces can be used parallel in the cluster). ZooKeeper will 
perform ICMP ECHO requests
-    or try to establish a TCP connection on port 7 (Echo) of the destination 
host in order to find 
+    or try to establish a TCP connection on port 7 (Echo) of the destination 
host in order to find
     the reachable addresses. This happens only if you provide multiple 
addresses in the configuration.
-    In this property you can set the timeout in millisecs for the reachability 
check. The check happens 
+    In this property you can set the timeout in millisecs for the reachability 
check. The check happens
     in parallel for the different addresses, so the timeout you set here is 
the maximum time will be taken
     by checking the reachability of all addresses.
     The default value is **1000**.
 
-    This parameter has no effect, unless you enable the MultiAddress feature 
by setting *multiAddress.enabled=true*.    
+    This parameter has no effect, unless you enable the MultiAddress feature 
by setting *multiAddress.enabled=true*.
 
 <a name="Experimental+Options%2FFeatures"></a>
 
@@ -1684,11 +1689,11 @@ the variable does.
 * *jute.maxbuffer.extrasize*:
     (Java system property: **zookeeper.jute.maxbuffer.extrasize**)
     **New in 3.5.7:**
-    While processing client requests ZooKeeper server adds some additional 
information into 
-    the requests before persisting it as a transaction. Earlier this 
additional information size 
+    While processing client requests ZooKeeper server adds some additional 
information into
+    the requests before persisting it as a transaction. Earlier this 
additional information size
     was fixed to 1024 bytes. For many scenarios, specially scenarios where 
jute.maxbuffer value
     is more than 1 MB and request type is multi, this fixed size was 
insufficient.
-    To handle all the scenarios additional information size is increased from 
1024 byte 
+    To handle all the scenarios additional information size is increased from 
1024 byte
     to same as jute.maxbuffer size and also it is made configurable through 
jute.maxbuffer.extrasize.
     Generally this property is not required to be configured as default value 
is the most optimal value.
 
@@ -1708,17 +1713,17 @@ the variable does.
 * *multiAddress.reachabilityCheckEnabled* :
     (Java system property: **zookeeper.multiAddress.reachabilityCheckEnabled**)
     **New in 3.6.0:**
-    Since ZooKeeper 3.6.0 you can also [specify multiple 
addresses](#id_multi_address) 
-    for each ZooKeeper server instance (this can increase availability when 
multiple physical 
+    Since ZooKeeper 3.6.0 you can also [specify multiple 
addresses](#id_multi_address)
+    for each ZooKeeper server instance (this can increase availability when 
multiple physical
     network interfaces can be used parallel in the cluster). ZooKeeper will 
perform ICMP ECHO requests
-    or try to establish a TCP connection on port 7 (Echo) of the destination 
host in order to find 
+    or try to establish a TCP connection on port 7 (Echo) of the destination 
host in order to find
     the reachable addresses. This happens only if you provide multiple 
addresses in the configuration.
-    The reachable check can fail if you hit some ICMP rate-limitation, (e.g. 
on MacOS) when you try to 
-    start a large (e.g. 11+) ensemble members cluster on a single machine for 
testing. 
-    
-    Default value is **true**. By setting this parameter to 'false' you can 
disable the reachability checks. 
-    Please note, disabling the reachability check will cause the cluster not 
to be able to reconfigure 
-    itself properly during network problems, so the disabling is advised only 
during testing. 
+    The reachable check can fail if you hit some ICMP rate-limitation, (e.g. 
on MacOS) when you try to
+    start a large (e.g. 11+) ensemble members cluster on a single machine for 
testing.
+
+    Default value is **true**. By setting this parameter to 'false' you can 
disable the reachability checks.
+    Please note, disabling the reachability check will cause the cluster not 
to be able to reconfigure
+    itself properly during network problems, so the disabling is advised only 
during testing.
 
     This parameter has no effect, unless you enable the MultiAddress feature 
by setting *multiAddress.enabled=true*.
 
diff --git 
a/zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerMetrics.java 
b/zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerMetrics.java
index 9521420..7ea7010 100644
--- 
a/zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerMetrics.java
+++ 
b/zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerMetrics.java
@@ -232,9 +232,12 @@ public final class ServerMetrics {
         NETTY_QUEUED_BUFFER = 
metricsContext.getSummary("netty_queued_buffer_capacity", DetailLevel.BASIC);
 
         DIGEST_MISMATCHES_COUNT = 
metricsContext.getCounter("digest_mismatches_count");
+
         TLS_HANDSHAKE_EXCEEDED = 
metricsContext.getCounter("tls_handshake_exceeded");
 
         CNXN_CLOSED_WITHOUT_ZK_SERVER_RUNNING = 
metricsContext.getCounter("cnxn_closed_without_zk_server_running");
+
+        SOCKET_CLOSING_TIME = metricsContext.getSummary("socket_closing_time", 
DetailLevel.BASIC);
     }
 
     /**
@@ -455,6 +458,8 @@ public final class ServerMetrics {
 
     public final Counter CNXN_CLOSED_WITHOUT_ZK_SERVER_RUNNING;
 
+    public final Summary SOCKET_CLOSING_TIME;
+
     private final MetricsProvider metricsProvider;
 
     public void resetAll() {
diff --git 
a/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Learner.java
 
b/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Learner.java
index da5f113..0b807ef 100644
--- 
a/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Learner.java
+++ 
b/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Learner.java
@@ -38,6 +38,7 @@ import java.util.concurrent.CountDownLatch;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.Executors;
 import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;
 import java.util.concurrent.atomic.AtomicReference;
 import javax.net.ssl.SSLSocket;
 import org.apache.jute.BinaryInputArchive;
@@ -46,10 +47,12 @@ import org.apache.jute.InputArchive;
 import org.apache.jute.OutputArchive;
 import org.apache.jute.Record;
 import org.apache.zookeeper.ZooDefs.OpCode;
+import org.apache.zookeeper.common.Time;
 import org.apache.zookeeper.common.X509Exception;
 import org.apache.zookeeper.server.ExitCode;
 import org.apache.zookeeper.server.Request;
 import org.apache.zookeeper.server.ServerCnxn;
+import org.apache.zookeeper.server.ServerMetrics;
 import org.apache.zookeeper.server.TxnLogEntry;
 import org.apache.zookeeper.server.ZooTrace;
 import org.apache.zookeeper.server.quorum.QuorumPeer.QuorumServer;
@@ -86,6 +89,7 @@ public class Learner {
 
     protected Socket sock;
     protected MultipleAddresses leaderAddr;
+    protected AtomicBoolean sockBeingClosed = new AtomicBoolean(false);
 
     /**
      * Socket getter
@@ -116,10 +120,14 @@ public class Learner {
 
     public static final String LEARNER_ASYNC_SENDING = "learner.asyncSending";
     private static boolean asyncSending = 
Boolean.getBoolean(LEARNER_ASYNC_SENDING);
+    public static final String LEARNER_CLOSE_SOCKET_ASYNC = 
"learner.closeSocketAsync";
+    public static final boolean closeSocketAsync = 
Boolean.getBoolean(LEARNER_CLOSE_SOCKET_ASYNC);
+
     static {
         LOG.info("leaderConnectDelayDuringRetryMs: {}", 
leaderConnectDelayDuringRetryMs);
         LOG.info("TCP NoDelay set to: {}", nodelay);
         LOG.info("{} = {}", LEARNER_ASYNC_SENDING, asyncSending);
+        LOG.info("{} = {}", LEARNER_CLOSE_SOCKET_ASYNC, closeSocketAsync);
     }
 
     final ConcurrentHashMap<Long, ServerCnxn> pendingRevalidations = new 
ConcurrentHashMap<Long, ServerCnxn>();
@@ -337,6 +345,7 @@ public class Learner {
             throw new IOException("Failed connect to " + multiAddr);
         } else {
             sock = socket.get();
+            sockBeingClosed.set(false);
         }
 
         self.authLearner.authenticate(sock, hostname);
@@ -847,10 +856,27 @@ public class Learner {
     }
 
     void closeSocket() {
+        if (sock != null) {
+            if (sockBeingClosed.compareAndSet(false, true)) {
+                if (closeSocketAsync) {
+                    final Thread closingThread = new Thread(() -> 
closeSockSync(), "CloseSocketThread(sid:" + zk.getServerId());
+                    closingThread.setDaemon(true);
+                    closingThread.start();
+                } else {
+                    closeSockSync();
+                }
+            }
+        }
+    }
+
+    void closeSockSync() {
         try {
+            long startTime = Time.currentElapsedTime();
             if (sock != null) {
                 sock.close();
+                sock = null;
             }
+            
ServerMetrics.getMetrics().SOCKET_CLOSING_TIME.add(Time.currentElapsedTime() - 
startTime);
         } catch (IOException e) {
             LOG.warn("Ignoring error closing connection to leader", e);
         }

[zookeeper] branch master updated: ZOOKEEPER-3574: Close quorum socket asynchronously to avoid server shutdown stalled by long socket closing time

Reply via email to