[ 
https://issues.apache.org/jira/browse/SSHD-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438353#comment-16438353
 ] 

Bill Kuker commented on SSHD-721:
---------------------------------

I am zeroing in on reproducing it...

(In the following steps "web browser" and "web server" are just tiny java 
functions playing those roles)
 # Create an SSH Server, SSH Client and a port forward listening on the SSH 
Server.
 # Configure the SSH Server with *N* nio worker threads.
 # Start the "Web Server" that accepts a connection, reads some data from the 
client, writes some data to the client, closes the connection.
 # Launch *W* "Web Browsers" that each open a connection to the Web Server via 
the port forward listening on the SSH Server.
 # Use a cyclic barrier or similar to make all *W* Web Browsers connect and 
send their request simultaneously. 

(/) When *W < N* everything works fine.

(x) When *W >= N* then the SSHD Server completely stops working. It will never 
recover (Unless some outside force kills some of the processes or tcp 
connections).

My guess is that the SSH Server can handle N-1 port forwards connecting at 
once, and the 1 extra worker thread is used for communication to the SSH 
Client. Once all worker threads are handling new connections to a port forward, 
no thread is available to make progress on communication to the sshd client.

I will be providing a Junit test case once I have cleaned up my code.

Replacing the nio2 thread pool, which is created by 
ThreadUtils::newFixedThreadPool() with a pool created by 
ThreadUtils::newCachedThreadPool() seems to fix the problem, although I am not 
suggesting that as the final fix.

If my guess is correct, my suggestion for the fix would be different worker 
pools for SSH Server / SSH Client communication and for handling the port 
forwards. This would allow SSHD to operate when the number of simultaneous port 
forwarded connections being established is larger than the number of threads in 
either pool, and also allow users to tune these two thread pools separately 
depending on their usage patterns. 

 

> deadlock: all nio workers wait to be woken up
> ---------------------------------------------
>
>                 Key: SSHD-721
>                 URL: https://issues.apache.org/jira/browse/SSHD-721
>             Project: MINA SSHD
>          Issue Type: Bug
>    Affects Versions: 1.3.0, 1.4.0
>            Reporter: Markus Rathgeb
>            Priority: Major
>
> I am using sshd-core for a server machine (S) that accepts incoming 
> connections and port forwarding requests.
> There are client machines (C) that run servers that should be accessible by a 
> tunnel to the server.
> On the client machines (C) also an implementation using sshd-core is running 
> that establish the connection to the server (S) and initiate the port 
> forwarding.
> Other clients are using the tunnelled connection to communicate with the 
> servers that are running on the client machines (C).
> Sometimes I realized that no data is transferred anymore (through the 
> tunnels).
> All the worker reside in the waitFor function and no one wakes them up.
> {noformat}
> "sshd-SshServer[67de991c]-nio2-thread-3" - Thread t@125
>    java.lang.Thread.State: TIMED_WAITING
>       at java.lang.Object.wait(Native Method)
>       - waiting on <132c6b60> (a java.lang.Object)
>       at 
> org.apache.sshd.client.channel.AbstractClientChannel.waitFor(AbstractClientChannel.java:244)
>       at 
> org.apache.sshd.common.forward.DefaultTcpipForwarder$StaticIoHandler.messageReceived(DefaultTcpipForwarder.java:984)
>       at 
> org.apache.sshd.common.io.nio2.Nio2Session.handleReadCycleCompletion(Nio2Session.java:276)
>       at 
> org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:256)
>       at 
> org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:253)
>       at 
> org.apache.sshd.common.io.nio2.Nio2CompletionHandler.lambda$completed$0(Nio2CompletionHandler.java:38)
>       at 
> org.apache.sshd.common.io.nio2.Nio2CompletionHandler$$Lambda$45/1071326492.run(Unknown
>  Source)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at 
> org.apache.sshd.common.io.nio2.Nio2CompletionHandler.completed(Nio2CompletionHandler.java:37)
>       at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)
>       at sun.nio.ch.Invoker$2.run(Invoker.java:218)
>       at 
> sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
>       - locked <6d02d7ed> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "sshd-SshServer[67de991c]-nio2-thread-2" - Thread t@124
>    java.lang.Thread.State: TIMED_WAITING
>       at java.lang.Object.wait(Native Method)
>       - waiting on <7e9f4eff> (a java.lang.Object)
>       at 
> org.apache.sshd.client.channel.AbstractClientChannel.waitFor(AbstractClientChannel.java:244)
>       at 
> org.apache.sshd.common.forward.DefaultTcpipForwarder$StaticIoHandler.messageReceived(DefaultTcpipForwarder.java:984)
>       at 
> org.apache.sshd.common.io.nio2.Nio2Session.handleReadCycleCompletion(Nio2Session.java:276)
>       at 
> org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:256)
>       at 
> org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:253)
>       at 
> org.apache.sshd.common.io.nio2.Nio2CompletionHandler.lambda$completed$0(Nio2CompletionHandler.java:38)
>       at 
> org.apache.sshd.common.io.nio2.Nio2CompletionHandler$$Lambda$45/1071326492.run(Unknown
>  Source)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at 
> org.apache.sshd.common.io.nio2.Nio2CompletionHandler.completed(Nio2CompletionHandler.java:37)
>       at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)
>       at sun.nio.ch.Invoker$2.run(Invoker.java:218)
>       at 
> sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
>       - locked <35fbf3e8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "sshd-SshServer[67de991c]-nio2-thread-1" - Thread t@122
>    java.lang.Thread.State: TIMED_WAITING
>       at java.lang.Object.wait(Native Method)
>       - waiting on <49ce93a9> (a java.lang.Object)
>       at 
> org.apache.sshd.client.channel.AbstractClientChannel.waitFor(AbstractClientChannel.java:244)
>       at 
> org.apache.sshd.common.forward.DefaultTcpipForwarder$StaticIoHandler.messageReceived(DefaultTcpipForwarder.java:984)
>       at 
> org.apache.sshd.common.io.nio2.Nio2Session.handleReadCycleCompletion(Nio2Session.java:276)
>       at 
> org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:256)
>       at 
> org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:253)
>       at 
> org.apache.sshd.common.io.nio2.Nio2CompletionHandler.lambda$completed$0(Nio2CompletionHandler.java:38)
>       at 
> org.apache.sshd.common.io.nio2.Nio2CompletionHandler$$Lambda$45/1071326492.run(Unknown
>  Source)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at 
> org.apache.sshd.common.io.nio2.Nio2CompletionHandler.completed(Nio2CompletionHandler.java:37)
>       at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)
>       at sun.nio.ch.Invoker.invokeDirect(Invoker.java:157)
>       at 
> sun.nio.ch.UnixAsynchronousSocketChannelImpl.implRead(UnixAsynchronousSocketChannelImpl.java:553)
>       at 
> sun.nio.ch.AsynchronousSocketChannelImpl.read(AsynchronousSocketChannelImpl.java:276)
>       at 
> sun.nio.ch.AsynchronousSocketChannelImpl.read(AsynchronousSocketChannelImpl.java:297)
>       at 
> org.apache.sshd.common.io.nio2.Nio2Session.doReadCycle(Nio2Session.java:304)
>       at 
> org.apache.sshd.common.io.nio2.Nio2Session.doReadCycle(Nio2Session.java:249)
>       at 
> org.apache.sshd.common.io.nio2.Nio2Session.startReading(Nio2Session.java:243)
>       at 
> org.apache.sshd.common.io.nio2.Nio2Session.startReading(Nio2Session.java:239)
>       at 
> org.apache.sshd.common.io.nio2.Nio2Session.startReading(Nio2Session.java:235)
>       at 
> org.apache.sshd.common.io.nio2.Nio2Session.startReading(Nio2Session.java:231)
>       at 
> org.apache.sshd.common.io.nio2.Nio2Session.startReading(Nio2Session.java:227)
>       at 
> org.apache.sshd.common.io.nio2.Nio2Acceptor$AcceptCompletionHandler.onCompleted(Nio2Acceptor.java:178)
>       at 
> org.apache.sshd.common.io.nio2.Nio2Acceptor$AcceptCompletionHandler.onCompleted(Nio2Acceptor.java:156)
>       at 
> org.apache.sshd.common.io.nio2.Nio2CompletionHandler.lambda$completed$0(Nio2CompletionHandler.java:38)
>       at 
> org.apache.sshd.common.io.nio2.Nio2CompletionHandler$$Lambda$45/1071326492.run(Unknown
>  Source)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at 
> org.apache.sshd.common.io.nio2.Nio2CompletionHandler.completed(Nio2CompletionHandler.java:37)
>       at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)
>       at sun.nio.ch.Invoker$2.run(Invoker.java:218)
>       at 
> sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
>       - locked <7d1c59e6> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "sshd-SshServer[67de991c]-timer-thread-1" - Thread t@105
>    java.lang.Thread.State: TIMED_WAITING
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for <655c080c> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
>       at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
> - None
> {noformat}
> To eliminate some other code that could trigger that error I created a 
> "minimal" example -- a simple test application -- that could be used to 
> demonstrate the hang (for me it is reproducible using that code).
> Please have a look at 
> https://github.com/maggu2810/sshd-deadlock/tree/first-report where you could 
> also find a readme with a short description about the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to