[ 
https://issues.apache.org/jira/browse/BROOKLYN-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275234#comment-14275234
 ] 

Aled Sage commented on BROOKLYN-106:
------------------------------------

According to `sudo sysctl -A`, the default settings on the brooklyn VM are as 
shown below. This means it will take 7200 + (9*75) seconds to detect a timeout 
- i.e. 2hours 11mins, but the debug log shows we got a timeout after 17m 4secs.

{noformat}
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_intvl = 75
{noformat}

According to http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html, that 
should also make the NAT server (i.e. the vcloud-director Edge Gateway) less 
likely to terminate the seemingly idle connection.

We still get the issue after running the command below. However, it is probably 
still a good idea.

{noformat}
sudo sysctl -w net.ipv4.tcp_keepalive_time=30 net.ipv4.tcp_keepalive_probes=6 
net.ipv4.tcp_keepalive_intvl=10
{noformat}

My various sshj fixes (long polling, appropriate timeouts and retries) seem to 
make things work ok now. However, I'm really worried that any command could 
fail. We haven't wrapped everything in retries, so we have really just 
decreased the error window.

> ssh command hangs (gettin stdout/stderr) for vcloud-director
> ------------------------------------------------------------
>
>                 Key: BROOKLYN-106
>                 URL: https://issues.apache.org/jira/browse/BROOKLYN-106
>             Project: Brooklyn
>          Issue Type: Bug
>    Affects Versions: 0.7.0-SNAPSHOT
>            Reporter: Aled Sage
>            Assignee: Aled Sage
>         Attachments: debug.log.tgz, jstack.txt, messages.tgz, ssh-stdout.txt
>
>
> When deploying Tomcat to VMware's vcloud-air, to a CentOS 6.4 VM, when 
> installing Java it hangs!
> The Brooklyn web-console shows that it is still waiting for a result from the 
> ssh command (which executed `sudo -E -n -S -- yum -y --nogpgcheck install 
> java-1.7.0-openjdk-devel`).
> However, when logging into the VM I can see that the `yum` command has 
> finished, and the /var/log/messages (attached) shows that the install 
> completed.
> This fails repeatedly. It used to pass!
> The stdout is at 32040 bytes. The last few lines of the stdout (as shown in 
> the web-console) are:
> {noformat}
>   Installing : libtasn1-2.3-6.el6_5.x86_64                                
> 50/56
>   Installing : gnutls-2.8.5-14.el6_5.x86_64                               
> 51/56
>   Installing : 1:cups-libs-1.4.2-67.el6.x86_64                            
> 52/56
> {noformat}
> Could there be some buffer set to 32K, so it's stuck not reading the rest of 
> the stdout (but `SshjToolPerformanceTest.testConsecutiveBigStdoutCommands` 
> passes)?
> Why else would our ssh command be stuck, not returning?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to