[ https://issues.apache.org/jira/browse/SSHD-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437233#comment-16437233 ]
Bill Kuker commented on SSHD-721: --------------------------------- I am uncertain of the answers to these. My full project includes keepalives and checks for crashed servers and clients and bad network connections, so after a certain amount of time stuck in this state the client applications kill their SSH clients and reconnect... Once all clients with port forwards have killed themselves sshd begins operating normally. Very similar to Markus's problem is that one web browser - web server connection over one port forward can get sshd into this state where all port forwards for all clients (not just the one client with the http port forward) halt. Because at this point sshd is using no cpu, allocating no memory, executing no code, I'd be inclined to call it deadlock and a bug rather than a performance issue. Increasing the number of NIO workers eases my pain, but my assumption would be that for any # of NIO workers there is a client behavior that can trigger the same deadlock. I can say this problem began when I upgraded from 0.3.0 to 1.2.0 and still occurs in 1.6.0. I am working on updating to 1.7.0 but there are some API changes so that'll take a little longer. There also seems to be some time based component to it. It happens more for some people bouncing from toronto to paris to ottawa than it does for people testing it on the same lan. yay. I guess my bottom line is that if someone from the SSHD team agrees that this is probably a bug I'll work real hard to reproduce it with the simplest test case I can... And maybe even a fix. I only have SSHD-85 to my name, but the dates on that ticket show a certain persistence ;) > deadlock: all nio workers wait to be woken up > --------------------------------------------- > > Key: SSHD-721 > URL: https://issues.apache.org/jira/browse/SSHD-721 > Project: MINA SSHD > Issue Type: Bug > Affects Versions: 1.3.0, 1.4.0 > Reporter: Markus Rathgeb > Priority: Major > > I am using sshd-core for a server machine (S) that accepts incoming > connections and port forwarding requests. > There are client machines (C) that run servers that should be accessible by a > tunnel to the server. > On the client machines (C) also an implementation using sshd-core is running > that establish the connection to the server (S) and initiate the port > forwarding. > Other clients are using the tunnelled connection to communicate with the > servers that are running on the client machines (C). > Sometimes I realized that no data is transferred anymore (through the > tunnels). > All the worker reside in the waitFor function and no one wakes them up. > {noformat} > "sshd-SshServer[67de991c]-nio2-thread-3" - Thread t@125 > java.lang.Thread.State: TIMED_WAITING > at java.lang.Object.wait(Native Method) > - waiting on <132c6b60> (a java.lang.Object) > at > org.apache.sshd.client.channel.AbstractClientChannel.waitFor(AbstractClientChannel.java:244) > at > org.apache.sshd.common.forward.DefaultTcpipForwarder$StaticIoHandler.messageReceived(DefaultTcpipForwarder.java:984) > at > org.apache.sshd.common.io.nio2.Nio2Session.handleReadCycleCompletion(Nio2Session.java:276) > at > org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:256) > at > org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:253) > at > org.apache.sshd.common.io.nio2.Nio2CompletionHandler.lambda$completed$0(Nio2CompletionHandler.java:38) > at > org.apache.sshd.common.io.nio2.Nio2CompletionHandler$$Lambda$45/1071326492.run(Unknown > Source) > at java.security.AccessController.doPrivileged(Native Method) > at > org.apache.sshd.common.io.nio2.Nio2CompletionHandler.completed(Nio2CompletionHandler.java:37) > at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126) > at sun.nio.ch.Invoker$2.run(Invoker.java:218) > at > sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Locked ownable synchronizers: > - locked <6d02d7ed> (a java.util.concurrent.ThreadPoolExecutor$Worker) > "sshd-SshServer[67de991c]-nio2-thread-2" - Thread t@124 > java.lang.Thread.State: TIMED_WAITING > at java.lang.Object.wait(Native Method) > - waiting on <7e9f4eff> (a java.lang.Object) > at > org.apache.sshd.client.channel.AbstractClientChannel.waitFor(AbstractClientChannel.java:244) > at > org.apache.sshd.common.forward.DefaultTcpipForwarder$StaticIoHandler.messageReceived(DefaultTcpipForwarder.java:984) > at > org.apache.sshd.common.io.nio2.Nio2Session.handleReadCycleCompletion(Nio2Session.java:276) > at > org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:256) > at > org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:253) > at > org.apache.sshd.common.io.nio2.Nio2CompletionHandler.lambda$completed$0(Nio2CompletionHandler.java:38) > at > org.apache.sshd.common.io.nio2.Nio2CompletionHandler$$Lambda$45/1071326492.run(Unknown > Source) > at java.security.AccessController.doPrivileged(Native Method) > at > org.apache.sshd.common.io.nio2.Nio2CompletionHandler.completed(Nio2CompletionHandler.java:37) > at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126) > at sun.nio.ch.Invoker$2.run(Invoker.java:218) > at > sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Locked ownable synchronizers: > - locked <35fbf3e8> (a java.util.concurrent.ThreadPoolExecutor$Worker) > "sshd-SshServer[67de991c]-nio2-thread-1" - Thread t@122 > java.lang.Thread.State: TIMED_WAITING > at java.lang.Object.wait(Native Method) > - waiting on <49ce93a9> (a java.lang.Object) > at > org.apache.sshd.client.channel.AbstractClientChannel.waitFor(AbstractClientChannel.java:244) > at > org.apache.sshd.common.forward.DefaultTcpipForwarder$StaticIoHandler.messageReceived(DefaultTcpipForwarder.java:984) > at > org.apache.sshd.common.io.nio2.Nio2Session.handleReadCycleCompletion(Nio2Session.java:276) > at > org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:256) > at > org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:253) > at > org.apache.sshd.common.io.nio2.Nio2CompletionHandler.lambda$completed$0(Nio2CompletionHandler.java:38) > at > org.apache.sshd.common.io.nio2.Nio2CompletionHandler$$Lambda$45/1071326492.run(Unknown > Source) > at java.security.AccessController.doPrivileged(Native Method) > at > org.apache.sshd.common.io.nio2.Nio2CompletionHandler.completed(Nio2CompletionHandler.java:37) > at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126) > at sun.nio.ch.Invoker.invokeDirect(Invoker.java:157) > at > sun.nio.ch.UnixAsynchronousSocketChannelImpl.implRead(UnixAsynchronousSocketChannelImpl.java:553) > at > sun.nio.ch.AsynchronousSocketChannelImpl.read(AsynchronousSocketChannelImpl.java:276) > at > sun.nio.ch.AsynchronousSocketChannelImpl.read(AsynchronousSocketChannelImpl.java:297) > at > org.apache.sshd.common.io.nio2.Nio2Session.doReadCycle(Nio2Session.java:304) > at > org.apache.sshd.common.io.nio2.Nio2Session.doReadCycle(Nio2Session.java:249) > at > org.apache.sshd.common.io.nio2.Nio2Session.startReading(Nio2Session.java:243) > at > org.apache.sshd.common.io.nio2.Nio2Session.startReading(Nio2Session.java:239) > at > org.apache.sshd.common.io.nio2.Nio2Session.startReading(Nio2Session.java:235) > at > org.apache.sshd.common.io.nio2.Nio2Session.startReading(Nio2Session.java:231) > at > org.apache.sshd.common.io.nio2.Nio2Session.startReading(Nio2Session.java:227) > at > org.apache.sshd.common.io.nio2.Nio2Acceptor$AcceptCompletionHandler.onCompleted(Nio2Acceptor.java:178) > at > org.apache.sshd.common.io.nio2.Nio2Acceptor$AcceptCompletionHandler.onCompleted(Nio2Acceptor.java:156) > at > org.apache.sshd.common.io.nio2.Nio2CompletionHandler.lambda$completed$0(Nio2CompletionHandler.java:38) > at > org.apache.sshd.common.io.nio2.Nio2CompletionHandler$$Lambda$45/1071326492.run(Unknown > Source) > at java.security.AccessController.doPrivileged(Native Method) > at > org.apache.sshd.common.io.nio2.Nio2CompletionHandler.completed(Nio2CompletionHandler.java:37) > at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126) > at sun.nio.ch.Invoker$2.run(Invoker.java:218) > at > sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Locked ownable synchronizers: > - locked <7d1c59e6> (a java.util.concurrent.ThreadPoolExecutor$Worker) > "sshd-SshServer[67de991c]-timer-thread-1" - Thread t@105 > java.lang.Thread.State: TIMED_WAITING > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <655c080c> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) > at > java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093) > at > java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) > at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Locked ownable synchronizers: > - None > {noformat} > To eliminate some other code that could trigger that error I created a > "minimal" example -- a simple test application -- that could be used to > demonstrate the hang (for me it is reproducible using that code). > Please have a look at > https://github.com/maggu2810/sshd-deadlock/tree/first-report where you could > also find a readme with a short description about the code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)