[
https://issues.apache.org/jira/browse/CLOUDSTACK-8611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252718#comment-15252718
]
ASF subversion and git services commented on CLOUDSTACK-8611:
-------------------------------------------------------------
Commit d518b619dda69dde4ecc1640e6c007182c9a9b75 in cloudstack's branch
refs/heads/master from [[email protected]]
[ https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;h=d518b61 ]
Merge pull request #1459 from GabrielBrascher/CLOUDSTACK-8611
This closes #561
CLOUDSTACK-8611:Handle SSH if server "forget" to send exit statusContinuing the
work started by @likitha, I did not cherry-picked the
commit (b9181c689e0e7b5f1e28c81d73710196dfabd0ba) from PR
<https://github.com/apache/cloudstack/pull/561> due to the fact that the path
of that SshHelper class was different of the current SshHelper; that is because
the fact that by cherry-picking it would seem that I had changed all the class
as the code is from another file.
I made some changes from the cherry-picked commit adding @wilderrodrigues
suggestions (create simple methods to have reusable code, make unit tests and
create the `WAITING_OPEN_SSH_SESSION` variable to manipulate with the delay of
1000 milliseconds).
Also, I tried to simplify the logic by assuming that ....
if ((conditions & ChannelCondition.EXIT_STATUS) != 0) {
if ((conditions & (ChannelCondition.STDOUT_DATA |
ChannelCondition.STDERR_DATA)) == 0) {
break;
}
}
... is the same as `((conditions & ChannelCondition.EXIT_STATUS) != 0) &&
((conditions & (ChannelCondition.STDOUT_DATA | ChannelCondition.STDERR_DATA))
== 0)`. This expression has the following results according to each possible
condition.
|Condition|Value|result
|-----------------|-------|------|
TIMEOUT | 0000001|false
CLOSED | 0000010 |false
STDERR_DATA | 0000100 | false
STDERR_DATA | 0001000 | false
EOF | 0010000 | false
EXIT_STATUS | 0100000 | **true**
EXIT_SIGNAL | 1000000 | false
After testing all the possibilities we can note that the condition of
`(conditions & ChannelCondition.EXIT_STATUS) != 0` is sufficient; thus, the
simplified "if" conditional can be:
`if ((conditions & ChannelCondition.EXIT_STATUS) != 0) {
break;
}`
This proposed work can be explained by quoting @likitha:
>CheckS2SVpnConnectionsCommand execution involves executing a script
>(checkbatchs2svpn.sh) in the virtual router. Once CS has opened a session to a
>virtual router and executed a script in the router, it waits indefinitely till
>the session either times out or the exit status of the remote process is
>available. But it is possible that an EOF is reached by the process in the
>router and the router never set the exit status.
>References -
>1. Some servers never send the exit status, or occasionally "forget" to do so
>(http://grepcode.com/file/repo1.maven.org/maven2/org.jvnet.hudson/trilead-ssh2/build212-hudson-1/com/trilead/ssh2/ChannelCondition.java).
>2. Get the exit code/status from the remote command - if available. Be careful
>- not all server implementations return this value -
>(http://grepcode.com/file/repo1.maven.org/maven2/org.jvnet.hudson/trilead-ssh2/build212-hudson-1/com/trilead/ssh2/Session.java#Session.waitForCondition%28int%2Clong%29).
* pr/1459:
Handle SSH if server "forget" to send exit status
Signed-off-by: Will Stevens <[email protected]>
> CS waits indefinitely for CheckS2SVpnConnectionsCommand to return
> -----------------------------------------------------------------
>
> Key: CLOUDSTACK-8611
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8611
> Project: CloudStack
> Issue Type: Bug
> Security Level: Public(Anyone can view this level - this is the
> default.)
> Reporter: Likitha Shetty
> Assignee: Suresh Kumar Anaparti
> Fix For: 4.9.0
>
>
> On one instance, CS began to execute CheckS2SVpnConnectionsCommand command on
> a router but the command result was never returned to the MS. If a command
> never returns, then 'DirectAgent' thread executing this command is blocked
> indefinitely and cannot pick up any other request.
> Now since this command is designed to execute in sequence on a host and is
> run regularly, every execution of that command thereafter on that particular
> host ended up picking up a DirectAgent thread and waiting for the previous
> execution to complete. And hence overtime, the host ended up using and
> blocking all 'DirectAgent' threads indefinitely.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)