It seems to be the monitoring that gets the agents disconnected.
Got this in my log file this last time they got disconnectd.
Jul 17, 2019 11:58:22 AM
hudson.init.impl.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler
uncaughtExc
eption
SEVERE: A thread (Timer-3450/103166) died unexpectedly due to an uncaught
exception, this may leave your Jenkins in a
bad way and is usually indicative of a bug in the code.
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:717)
at java.util.Timer.<init>(Timer.java:160)
at java.util.Timer.<init>(Timer.java:132)
at
org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.scheduleRetryQueueProcessing(EventDispatcher.java:296
)
at
org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.processRetries(EventDispatcher.java:437)
at
org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$1.run(EventDispatcher.java:299)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Jul 17, 2019 11:58:31 AM
hudson.init.impl.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler
uncaughtExc
eption
SEVERE: A thread (Thread-30062/98187) died unexpectedly due to an uncaught
exception, this may leave your Jenkins in
a bad way and is usually indicative of a bug in the code.
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:717)
at
com.trilead.ssh2.transport.TransportManager.sendAsynchronousMessage(TransportManager.java:649)
at
com.trilead.ssh2.channel.ChannelManager.msgChannelRequest(ChannelManager.java:1213)
at
com.trilead.ssh2.channel.ChannelManager.handleMessage(ChannelManager.java:1466)
at
com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:809)
at
com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:502)
at java.lang.Thread.run(Thread.java:748)
Now I have gotten catastrophic failure. I cannot relaunch any agents any
more.
[07/17/19 12:04:10] [SSH] Opening SSH connection to
jbssles120x64r12.spacetec.no:22.
ERROR: Unexpected error in launching a agent. This is probably a bug in Jenkins.
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:717)
at
com.trilead.ssh2.transport.TransportManager.initialize(TransportManager.java:545)
at com.trilead.ssh2.Connection.connect(Connection.java:774)
at
hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:817)
at hudson.plugins.sshslaves.SSHLauncher$1.call(SSHLauncher.java:419)
at hudson.plugins.sshslaves.SSHLauncher$1.call(SSHLauncher.java:406)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[07/17/19 12:04:10] Launch failed - cleaning up connection
[07/17/19 12:04:10] [SSH] Connection closed.
My Jenkins server has over 500 threads open
Threads: 506 total, 0 running, 506 sleeping, 0 stopped, 0 zombie
onsdag 17. juli 2019 10.24.12 UTC+2 skrev Sverre Moe følgende:
>
> We have had to blissfull days of stable Jenkins. Today two nodes are
> disconnected and they will not come back online.
>
> What is strange is it is the same two-three nodes every time.
> Running disconnect on them through the URL
> http://jenkins.example.com/jenkins/computer/NODE_NAME/disconnect, does
> not work.
> I have to enter configuration, Save, then relaunch to get them up running.
>
> I tried setting the ulimit values as suggested in
>
> https://support.cloudbees.com/hc/en-us/articles/222446987-Prepare-Jenkins-for-Support#bulimitsettingsjustforlinuxos
>
> I have also added additional JVM options as suggested in
>
> https://support.cloudbees.com/hc/en-us/articles/222446987-Prepare-Jenkins-for-Support#ajavaparameters
> https://go.cloudbees.com/docs/solutions/jvm-troubleshooting/
>
> The number of threads of Jenkins server is currently 265. Yesterday when
> all was fine this was up to 300.
>
>
> Maybe ralted or unrelated:
> When this happens we have some builds on other nodes that stops working.
> They are aborted, but are still showing as running. The only thing that
> works is deleting the agent and creating it again, that or restarting
> Jenkins.
>
>
> søndag 14. juli 2019 13.31.51 UTC+2 skrev Sverre Moe følgende:
>>
>> I suspected it might be related, but was not sure.
>>
>> The odd thing this just started being a problem a week ago. Nothing as
>> far as I can see has changed on the Jenkins server.
>>
>> lørdag 13. juli 2019 13.04.44 UTC+2 skrev Ivan Fernandez Calvo følgende:
>>>
>>> I saw that you have another question related with OOM errors in Jenkins
>>> if it is the same instance , this is your real issue with the agents, until
>>> you do not have a stable Jenkins instance the agent disconnection will be a
>>> side effect.
>>>>
>>>>
--
You received this message because you are subscribed to the Google Groups
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/jenkinsci-users/9800172e-4ed4-40b3-80b1-76a26ba8591e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.