[ 
https://issues.apache.org/jira/browse/SSHD-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15121358#comment-15121358
 ] 

Eugene Petrenko commented on SSHD-633:
--------------------------------------

Yes, of course I do create a new thread from my command implementation. But 
according to 1.0 sources, there are two actions that are done in parallel:
* the output that I write from my a command running thread I started
* the reply to SSH request via 
{{jetbrains.ssh.server.SSHChannelSessionEx#handleRequest}} method.

Both activities are running in different thread. It may turn out the command 
thread is executed *before* 
{{jetbrains.ssh.server.SSHChannelSessionEx#handleRequest}} is finished. So it 
may start writing command output even before 
{{jetbrains.ssh.server.SSHChannelSessionEx#sendResponse}} is executed.

>From the other hand, SSH channel uses Window to limit a number of packets that 
>are send within a channel. So, this means, by chance, {{#sendResponse}} 
>message can stuck in a queue. In my case the implementation of JSCH client did 
>nothing before server replies on the request message with either success or 
>failure. Here we have a race condition with queue order. If {{#sendResponse}} 
>is executed fast enough, the message is delivered and client works. If we are 
>unlucky with OS thread scheduling, we may fill up the window with command 
>output packets, not the response. 

I agree, and I believe I wrote it, I did an incorrect test. But this test 
simply shows the possible thread scheduling order that leads JSCH client to 
stuck. At least the was helpful to me to reproduce similar error I saw in 
production. 

I know about NIO_WORKERS and MAX_PACKET_SIZE settings. I updated the 
beforehand. Unfortunately, it did not workout. I also realised the faster 
command implementation works the higher probability of JSCH issue is. According 
to the protocol, SSH client may not request the server to replay on request 
command. In that case there is no deadlock possible.


To fix the issue, I added additional synchronisation in my code. I created an 
inheritor of ChannelSession as follows
{code}public class SSHChannelSessionEx extends ChannelSession {
  private final ReentrantLock myProcessSessionLock = new ReentrantLock();

  @Override
  public void handleRequest(Buffer buffer) throws IOException {
    myProcessSessionLock.lock();
    try {
      super.handleRequest(buffer);
    } finally {
      myProcessSessionLock.unlock();
    }
  }

  @Override
  protected void sendResponse(Buffer buffer, String req, RequestHandler.Result 
result, boolean wantReply) throws IOException {
    final ByteArrayBuffer newBuffer = new ByteArrayBuffer(5 + 1 + (Integer.SIZE 
/ Byte.SIZE));
    //it is suspiciously strange that input buffer gets re-used for response. 
It's way better to have a copy here
    super.sendResponse(newBuffer, req, result, wantReply);
  }

  public void waitBeforeCommandStartIsAllowed() throws InterruptedException {
    if (myProcessSessionLock.isHeldByCurrentThread()) {
      throw new Error("Should not be called from Command#start method!");
    }

    //make sure it is not locked and wait for it to continue
    myProcessSessionLock.lock();
    myProcessSessionLock.unlock();
  }
}
{code}

Next, I implemented {{ChannelSessionAware}} in my command implementation. Next 
I added a call to a session {{waitBeforeCommandStartIsAllowed()}} method from 
command running thread BEFORE any output is written. 

The possible fix in the library could be to make sure 
{{jetbrains.ssh.server.SSHChannelSessionEx#sendResponse}} is called before a 
command's {{#start}} method is called.  


> Race condition in command execution and SSH_MSG_CHANNEL_SUCCESS
> ---------------------------------------------------------------
>
>                 Key: SSHD-633
>                 URL: https://issues.apache.org/jira/browse/SSHD-633
>             Project: MINA SSHD
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Eugene Petrenko
>            Priority: Critical
>
> We use the library for production. From time to time we see timeout 
> exceptions from clients that calls our SSH server. It was not clear what is 
> that. Most of the time I saw command suck reading STDIN. We use JSCH as 
> client here. The problem was reproducing rarely.
> I connect JSCH 1.51 to SSHD server to execute a command
> JSCH client expects to read {{SSH_MSG_CHANNEL_SUCCESS}} or 
> {{SSH_MSG_CHANNEL_FAILURE}} as response to {{SSH_MSG_CHANNEL_REQUEST}} call 
> to execute command. 
> SSHD implementation calls command's method {{start}} and than posts reply 
> with success to the channel. It may easily command thread is able to fill 
> send window with DATA messages. Thus {{SSH_MSG_CHANNEL_SUCCESS}} reply is not 
> delivered. 
> This makes JSCH to wait for the message and fail on timeout. The SSHD server 
> command is simply suck on reading stdin.
> Here goes the code I managed to reproduce the issue. What I did is to start 
> sending a huge data block without creating a thread in command's start 
> method. Of course this is a bit incorrect, but it easily make it fail. 
> The server contains the following command implementation
> {code}
>     final OutputStream out = getOut();
>     out.write(new byte[32 * 1024 * 1024]);
>     out.flush();
>     new Thread(new Runnable() {
>       @Override
>       public void run() {
>         try {
>           getIn().read();
>           onExit(0);
>         } catch (Throwable e) {
>           onExit(1, e.getMessage());
>         }
>       }
>     }).start();
> {code} 
> Jsch client code is follows
> {code}
>         final JSch j = new JSch();
>         final Session session = j.getSession("jonnyzzz", 
> myResource.getHostname(), myResource.getSSHPort());
>         session.setTimeout(60_000);
>         session.connect();
>         final ChannelExec e = (ChannelExec) session.openChannel("exec");
>         e.setCommand("test-buffer-underrun");
>         final InputStream inputStream = new e.getInputStream();
>         e.connect(10_000); //meaningful timeout to reproduce the bug
>         ByteStreams.copy(inputStream, FileUtil.nullOutputStream());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to