[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-8611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252719#comment-15252719
 ] 

ASF subversion and git services commented on CLOUDSTACK-8611:
-------------------------------------------------------------

Commit d518b619dda69dde4ecc1640e6c007182c9a9b75 in cloudstack's branch 
refs/heads/master from [[email protected]]
[ https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;h=d518b61 ]

Merge pull request #1459 from GabrielBrascher/CLOUDSTACK-8611

This closes #561

CLOUDSTACK-8611:Handle SSH if server "forget" to send exit statusContinuing the 
work started by @likitha, I did not cherry-picked the
commit (b9181c689e0e7b5f1e28c81d73710196dfabd0ba) from PR 
<https://github.com/apache/cloudstack/pull/561> due to the fact that the path 
of that SshHelper class was different of the current SshHelper; that is because 
the fact that by cherry-picking it would seem that I had changed all the class 
as the code is from another file.

I made some changes from the cherry-picked commit adding @wilderrodrigues 
suggestions (create simple methods to have reusable code, make unit tests and 
create the `WAITING_OPEN_SSH_SESSION` variable to manipulate with the delay of 
1000 milliseconds).

Also, I tried to simplify the logic by assuming that ....

    if ((conditions & ChannelCondition.EXIT_STATUS) != 0) {
            if ((conditions & (ChannelCondition.STDOUT_DATA | 
ChannelCondition.STDERR_DATA)) == 0) {
                break;
            }
    }

... is the same as `((conditions & ChannelCondition.EXIT_STATUS) != 0) && 
((conditions & (ChannelCondition.STDOUT_DATA | ChannelCondition.STDERR_DATA)) 
== 0)`. This expression has the following results according to each possible 
condition.

|Condition|Value|result
|-----------------|-------|------|
TIMEOUT  | 0000001|false
CLOSED  | 0000010 |false
STDERR_DATA | 0000100 | false
STDERR_DATA | 0001000 | false
EOF         | 0010000 | false
EXIT_STATUS | 0100000 | **true**
EXIT_SIGNAL | 1000000 | false

After testing all the possibilities we can note that the condition of 
`(conditions & ChannelCondition.EXIT_STATUS) != 0` is sufficient; thus, the 
simplified "if" conditional can be:

`if ((conditions & ChannelCondition.EXIT_STATUS) != 0) {
    break;
}`

This proposed work can be explained by quoting @likitha:
>CheckS2SVpnConnectionsCommand execution involves executing a script 
>(checkbatchs2svpn.sh) in the virtual router. Once CS has opened a session to a 
>virtual router and executed a script in the router, it waits indefinitely till 
>the session either times out or the exit status of the remote process is 
>available. But it is possible that an EOF is reached by the process in the 
>router and the router never set the exit status.

>References -
>1. Some servers never send the exit status, or occasionally "forget" to do so 
>(http://grepcode.com/file/repo1.maven.org/maven2/org.jvnet.hudson/trilead-ssh2/build212-hudson-1/com/trilead/ssh2/ChannelCondition.java).
>2. Get the exit code/status from the remote command - if available. Be careful 
>- not all server implementations return this value - 
>(http://grepcode.com/file/repo1.maven.org/maven2/org.jvnet.hudson/trilead-ssh2/build212-hudson-1/com/trilead/ssh2/Session.java#Session.waitForCondition%28int%2Clong%29).

* pr/1459:
  Handle SSH if server "forget" to send exit status

Signed-off-by: Will Stevens <[email protected]>


> CS waits indefinitely for CheckS2SVpnConnectionsCommand to return
> -----------------------------------------------------------------
>
>                 Key: CLOUDSTACK-8611
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8611
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>            Reporter: Likitha Shetty
>            Assignee: Suresh Kumar Anaparti
>             Fix For: 4.9.0
>
>
> On one instance, CS began to execute CheckS2SVpnConnectionsCommand command on 
> a router but the command result was never returned to the MS. If a command 
> never returns, then 'DirectAgent' thread executing this command is blocked 
> indefinitely and cannot pick up any other request.
> Now since this command is designed to execute in sequence on a host and is 
> run regularly, every execution of that command thereafter on that particular 
> host ended up picking up a DirectAgent thread and waiting for the previous 
> execution to complete. And hence overtime, the host ended up using and 
> blocking all 'DirectAgent' threads indefinitely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to