|
||||||||
|
This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators. For more information on JIRA, see: http://www.atlassian.com/software/jira |
||||||||
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
For more options, visit https://groups.google.com/d/optout.

I'm experiencing the same problems with EC2 slaves.
We're using a custom AWS Linux AMI and slaves that terminate after 30 minutes of inactivity, instance type C3Large.
At seemingly random moments, slaves lose connectivity.
Sometimes the slaves run fine for a while, sometimes a few lose connectivity in a row.
Symptoms:
We experimented with ClientAliveInterval 15 in the sshd config on the slave; didn't help.
I added process list logging to see what happens.
The slave process disappears without anything strange noticable (except for a disconnect on the master).
This can mean that either the slave Java process terminates unexpectedly, or the ssh connection terminated through a timeout.
Looking at the logging, the latter seems to be happening. Around the second that the slave process disappears from the process list, the following logging appears in /var/log/secure:
Feb 3 11:24:43 ip-10-4-33-150 sshd[2243]: Timeout, client not responding.
Feb 3 11:24:43 ip-10-4-33-150 sshd[2241]: pam_unix(sshd:session): session closed for user ec2-user
That means that sshd is terminating the connection.
On another build environment with pratically the same setup (Ubuntu AMI), we don't see the disconnects.
I compared the two sshd config files on the slaves.
Noticeable difference:
The next thing we're going to try is to remove ClientAliveInterval and enable "TCPKeepAlive yes" on the AWS Linux slave.