[ https://issues.apache.org/jira/browse/GEODE-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281406#comment-17281406 ]
ASF GitHub Bot commented on GEODE-8817: --------------------------------------- Bill commented on a change in pull request #5987: URL: https://github.com/apache/geode/pull/5987#discussion_r572391356 ########## File path: geode-core/src/main/java/org/apache/geode/internal/net/SocketCloser.java ########## @@ -163,60 +164,72 @@ private Future asyncExecute(String address, Runnable runnableToExecute) { } /** - * Closes the specified socket in a background thread. In some cases we see close hang (see bug - * 33665). Depending on how the SocketCloser is configured (see ASYNC_CLOSE_WAIT_MILLISECONDS) + * Closes the specified socket in a background thread. In some cases we see close hang. + * Depending on how the SocketCloser is configured (see ASYNC_CLOSE_WAIT_MILLISECONDS) * this method may block for a certain amount of time. If it is called after the SocketCloser is * closed then a normal synchronous close is done. * * @param socket the socket to close * @param address identifies who the socket is connected to - * @param extra an optional Runnable with stuff to execute before the socket is closed + * @param runBeforeClose a Runnable with stuff to execute before the socket is closed */ - public void asyncClose(final Socket socket, final String address, final Runnable extra) { - if (socket == null || socket.isClosed()) { + @NotNull Review comment: I don't think it makes sense to add the `@NotNull` annotation to a `void` method, since the method isn't returning anything at all. ########## File path: geode-core/src/main/java/org/apache/geode/internal/cache/tier/sockets/ServerConnection.java ########## @@ -1412,11 +1412,19 @@ public boolean cleanup() { getAcceptor().decClientServerConnectionCount(); } - try { - theSocket.close(); - } catch (Exception ignored) { - } + if (!theSocket.isClosed()) { + final String closerName = + communicationMode.isWAN() ? "WANSocketCloser" : "CacheServerSocketCloser"; Review comment: `closerName` is being passed as the `address` parameter of `asyncClose()`. The doc on that method says: ``` * @param address identifies who the socket is connected to ``` Other calls to `asyncClose()` send an address. If there is a reason we don't want to send an address here, would you mind adding a comment to explain it? ########## File path: geode-core/src/main/java/org/apache/geode/distributed/internal/LonerDistributionManager.java ########## @@ -1054,14 +1055,14 @@ public boolean awaitTermination(long timeout, TimeUnit unit) throws InterruptedE @Override public <T> Future<T> submit(final Callable<T> task) { - Exception ex = null; - T result = null; + CompletableFuture future = new CompletableFuture<>(); Review comment: This new code looks like an improvement. But the method is on `DummyExecutor`: question: is this class used? Because I don't see any static usages of the class. Can we remove `DummyExecutor` entirely? ########## File path: geode-core/src/main/java/org/apache/geode/internal/net/SocketCloser.java ########## @@ -163,60 +164,72 @@ private Future asyncExecute(String address, Runnable runnableToExecute) { } /** - * Closes the specified socket in a background thread. In some cases we see close hang (see bug - * 33665). Depending on how the SocketCloser is configured (see ASYNC_CLOSE_WAIT_MILLISECONDS) + * Closes the specified socket in a background thread. In some cases we see close hang. + * Depending on how the SocketCloser is configured (see ASYNC_CLOSE_WAIT_MILLISECONDS) * this method may block for a certain amount of time. If it is called after the SocketCloser is * closed then a normal synchronous close is done. * * @param socket the socket to close * @param address identifies who the socket is connected to - * @param extra an optional Runnable with stuff to execute before the socket is closed + * @param runBeforeClose a Runnable with stuff to execute before the socket is closed */ - public void asyncClose(final Socket socket, final String address, final Runnable extra) { - if (socket == null || socket.isClosed()) { + @NotNull + public void asyncClose(final Socket socket, final String address, final Runnable runBeforeClose) { + asyncClose(socket, address, runBeforeClose, () -> { + }); + } + + /** + * Closes the specified socket in a background thread. In some cases we see close hang. + * Depending on how the SocketCloser is configured (see ASYNC_CLOSE_WAIT_MILLISECONDS) + * this method may block for a certain amount of time. If it is called after the SocketCloser is + * closed then a normal synchronous close is done. + * + * @param socket the socket to close + * @param address identifies who the socket is connected to + * @param runBeforeClose a Runnable with stuff to execute before the socket is closed + * @param runAfterClose a Runnable with stuff to execute after the socket is closed + */ + @NotNull Review comment: ditto ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > server hangs in cache close with ssl enabled due to active client connection; > client side (CacheClientUpdater.close()) is hung in > SSLSocketImpl$AppInputStream.deplete() > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > Key: GEODE-8817 > URL: https://issues.apache.org/jira/browse/GEODE-8817 > Project: Geode > Issue Type: Bug > Components: client/server, security > Affects Versions: 1.14.0 > Reporter: Bill Burcham > Assignee: Bill Burcham > Priority: Major > Labels: blocks-1.14.0, pull-request-available > > A proprietary TLS/SSL-enabled application encountered a network partition. A > server hangs in cache close due to active client connection; client side > ({{CacheClientUpdater.close()}}) is hung in > {{SSLSocketImpl$AppInputStream.deplete()}} > The configuration is: > {noformat} > ========================================== > losingSide | survivingSide > ========================================== > 11110 | 10627 > 11115 | 10632 > ------------------------------------------ > 11139 | 10655 > | 10662 > ------------------------------------------ > {noformat} > The stuck threads were stuck in sun's SSL code. Geode's client/Server > framework uses old I/O and that was also part of where they were stuck. If > the clients had closed their connections to the server then it would not have > been stuck here. But the server shutdown shouldn't hang because of client > that refuses to disconnect. > The Geode client-side of the connection is hung here: > {code:java} > \[warn 2020/11/06 14:56:56.577 PST <ThreadsMonitor> tid=0x18] Thread <50> > (0x32) that was executed at <06 Nov 2020 14:55:43 PST> has been stuck for > <72.81 seconds> and number of thread monitor iteration <1> > Thread Name <poolTimer-brclient-7> state <BLOCKED> > Waiting on <sun.security.ssl.SSLSocketImpl$AppInputStream@3a9b72d3> > Owned By <Cache Client Updater Thread on > 10.32.108.224(bridgep2_host2_10627:10627)<v3>:41003 port 27636> with ID <43> > Executor Group <ScheduledThreadPoolExecutorWithKeepAlive> > Monitored metric <ResourceManagerStats.numThreadsStuck> > Thread stack: > sun.security.ssl.SSLSocketImpl$AppInputStream.deplete(SSLSocketImpl.java:1016) > sun.security.ssl.SSLSocketImpl$AppInputStream.access$100(SSLSocketImpl.java:816) > sun.security.ssl.SSLSocketImpl.bruteForceCloseInput(SSLSocketImpl.java:702) > sun.security.ssl.SSLSocketImpl.duplexCloseOutput(SSLSocketImpl.java:553) > sun.security.ssl.SSLSocketImpl.close(SSLSocketImpl.java:485) > org.apache.geode.internal.cache.tier.sockets.CacheClientUpdater.close(CacheClientUpdater.java:546) > org.apache.geode.cache.client.internal.QueueConnectionImpl.internalDestroy(QueueConnectionImpl.java:112) > org.apache.geode.cache.client.internal.QueueManagerImpl.endpointCrashed(QueueManagerImpl.java:379) > org.apache.geode.cache.client.internal.QueueManagerImpl.connectionCrashed(QueueManagerImpl.java:357) > org.apache.geode.cache.client.internal.QueueConnectionImpl.destroy(QueueConnectionImpl.java:88) > org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:645) > org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:504) > org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnServer(OpExecutorImpl.java:334) > org.apache.geode.cache.client.internal.OpExecutorImpl.executeOn(OpExecutorImpl.java:303) > org.apache.geode.cache.client.internal.PoolImpl.executeOn(PoolImpl.java:839) > org.apache.geode.cache.client.internal.PingOp.execute(PingOp.java:38) > org.apache.geode.cache.client.internal.LiveServerPinger$PingTask.run2(LiveServerPinger.java:90) > org.apache.geode.cache.client.internal.PoolImpl$PoolTask.run(PoolImpl.java:1329) > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > org.apache.geode.internal.ScheduledThreadPoolExecutorWithKeepAlive$DelegatingScheduledFuture.run(ScheduledThreadPoolExecutorWithKeepAlive.java:279) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > java.lang.Thread.run(Thread.java:748) > Lock owner thread stack > java.net.SocketInputStream.socketRead0(Native Method) > java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > java.net.SocketInputStream.read(SocketInputStream.java:171) > java.net.SocketInputStream.read(SocketInputStream.java:141) > sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:475) > sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:469) > sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:69) > sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1228) > sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:75) > sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:915) > org.apache.geode.internal.cache.tier.sockets.Message.fetchHeader(Message.java:830) > org.apache.geode.internal.cache.tier.sockets.Message.readHeaderAndBody(Message.java:677) > org.apache.geode.internal.cache.tier.sockets.Message.receiveWithHeaderReadTimeout(Message.java:1140) > org.apache.geode.internal.cache.tier.sockets.CacheClientUpdater.processMessages(CacheClientUpdater.java:1591) > org.apache.geode.internal.cache.tier.sockets.CacheClientUpdater.run(CacheClientUpdater.java:483) > {code} > From the Geode log we can see the expected exceptions for messages to losing > side (Operation not permitted) after the network is dropped: > {code:java} > [warn 2020/11/06 14:50:20.865 PST <Geode Failure Detection thread 3> > tid=0x9b] Unable to send message to > 10.32.110.94(bridgep1_host1_11115:11115)<v4>:41001 > java.io.IOException: Operation not permitted (sendto failed) > at java.net.PlainDatagramSocketImpl.send(Native Method) > at java.net.DatagramSocket.send(DatagramSocket.java:693) > at org.jgroups.protocols.UDP._send(UDP.java:224) > at org.jgroups.protocols.UDP.sendUnicast(UDP.java:215) > at org.jgroups.protocols.TP.sendToSingleMember(TP.java:1985) > at org.jgroups.protocols.TP.doSend(TP.java:1962) > at > org.apache.geode.distributed.internal.membership.gms.messenger.Transport.doSend(Transport.java:85) > at org.jgroups.protocols.TP.send(TP.java:1948) > at > org.apache.geode.distributed.internal.membership.gms.messenger.Transport._send(Transport.java:57) > at org.jgroups.protocols.TP.down(TP.java:1515) > at org.jgroups.stack.Protocol.down(Protocol.java:439) > at > org.apache.geode.distributed.internal.membership.gms.messenger.StatRecorder.down(StatRecorder.java:87) > at org.jgroups.protocols.UNICAST3.down(UNICAST3.java:646) > at org.jgroups.protocols.FlowControl.down(FlowControl.java:347) > at org.jgroups.protocols.FRAG2.down(FRAG2.java:136) > at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1039) > at org.jgroups.JChannel.down(JChannel.java:790) > at org.jgroups.JChannel.send(JChannel.java:426) > at > org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.send(JGroupsMessenger.java:819) > at > org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.send(JGroupsMessenger.java:670) > at > org.apache.geode.distributed.internal.membership.gms.fd.GMSHealthMonitor.doCheckMember(GMSHealthMonitor.java:504) > at > org.apache.geode.distributed.internal.membership.gms.fd.GMSHealthMonitor.lambda$checkMember$1(GMSHealthMonitor.java:455) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)