[
https://issues.apache.org/jira/browse/BROOKLYN-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15324327#comment-15324327
]
Svetoslav Neykov commented on BROOKLYN-298:
-------------------------------------------
Adding a keep-alive should detect this.
http://stackoverflow.com/questions/10351484/how-to-keep-ssh-connections-alive-using-sshj
> sshj hangs (waiting for shell to finish) after script completed - maybe VPN
> went down+up during exec
> ----------------------------------------------------------------------------------------------------
>
> Key: BROOKLYN-298
> URL: https://issues.apache.org/jira/browse/BROOKLYN-298
> Project: Brooklyn
> Issue Type: Bug
> Affects Versions: 0.9.0
> Reporter: Aled Sage
>
> I was deploying an app whose launch command started docker and pulled an
> image. The task hung, showing in the web-console:
> {noformat}
> In progress - SSH executing, launching
> VanillaSoftwareProcessImpl{id=nisq2gz4yi}
> {noformat}
> I believe this is because my VPN disconnected and then reconnected, and our
> sshj command keeps waiting for the result - even though the command has
> finished executing.
> Looking at the target VM, the command has completed (and the script uploaded
> by SshjTool has been deleted). There is no evidence of any Brooklyn-initiated
> commands executing, according to {{ps aux}}.
> Drilling into the activity view in the Brooklyn web-console, the currently
> executing thread shows:
> {noformat}
> SSH executing, launching VanillaSoftwareProcessImpl{id=nisq2gz4yi}
> Task[ssh: launching VanillaSoftwareProcessImpl{id=nisq2gz4yi}]@TPnVc8Qs
> Submitted by SoftlyPresent[value=Task[launch (main)]@mvL4OvdH]
> In progress, thread waiting (timed) on
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@408df99d
> At: net.schmizz.concurrent.Promise.tryRetrieve(Promise.java:168)
> net.schmizz.concurrent.Promise.retrieve(Promise.java:137)
> net.schmizz.concurrent.Event.await(Event.java:103)
>
> net.schmizz.sshj.connection.channel.AbstractChannel.join(AbstractChannel.java:282)
>
> org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool$ShellAction.create(SshjTool.java:1012)
>
> org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool$ShellAction.create(SshjTool.java:925)
>
> org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool.acquire(SshjTool.java:630)
>
> org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool.acquire(SshjTool.java:616)
>
> org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool$1.run(SshjTool.java:331)
>
> org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool.execScript(SshjTool.java:326)
>
> org.apache.brooklyn.util.core.task.system.internal.ExecWithLoggingHelpers$1.exec(ExecWithLoggingHelpers.java:82)
>
> org.apache.brooklyn.util.core.task.system.internal.ExecWithLoggingHelpers$3.apply(ExecWithLoggingHelpers.java:166)
>
> org.apache.brooklyn.util.core.task.system.internal.ExecWithLoggingHelpers$3.apply(ExecWithLoggingHelpers.java:164)
> org.apache.brooklyn.util.pool.BasicPool.exec(BasicPool.java:146)
>
> org.apache.brooklyn.location.ssh.SshMachineLocation.execSsh(SshMachineLocation.java:611)
>
> org.apache.brooklyn.location.ssh.SshMachineLocation$13.execWithTool(SshMachineLocation.java:790)
>
> org.apache.brooklyn.util.core.task.system.internal.ExecWithLoggingHelpers.execWithLogging(ExecWithLoggingHelpers.java:164)
>
> org.apache.brooklyn.util.core.task.system.internal.ExecWithLoggingHelpers.execScript(ExecWithLoggingHelpers.java:80)
>
> org.apache.brooklyn.location.ssh.SshMachineLocation.execScript(SshMachineLocation.java:774)
>
> org.apache.brooklyn.entity.software.base.AbstractSoftwareProcessSshDriver.execute(AbstractSoftwareProcessSshDriver.java:272)
>
> org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper.executeInternal(ScriptHelper.java:366)
>
> org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper$8.call(ScriptHelper.java:287)
>
> org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper$8.call(ScriptHelper.java:285)
>
> org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:359)
>
> org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:519)
> {noformat}
> Running {{netstat -antp TCP}} on my local machine, I still see an established
> ssh connection:
> {noformat}
> tcp4 0 0 10.104.3.10.54535 10.104.1.193.22 ESTABLISHED
> {noformat}
> I do *not* see a corresponding entry when I run {{sudo netsat -anp}} on the
> target VM.
> ---
> Looking in the Brooklyn code at {{SshjTool$ShellAction.create}}, I wonder
> what else we could call on sshj to check if our connection is ok and/or the
> command has actually completed. We are already calling {{shell.isOpen()}} and
> {{session.getExitStatus()!=null}}. We could add calls to
> {{session.isOpen()}}, {{session.getExitSignal()}} and/or
> {{session.getExitWasCoreDumped()}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)