It seems to be the monitoring that gets the agents disconnected.

Got this in my log file this last time they got disconnectd.

Jul 17, 2019 11:58:22 AM 
hudson.init.impl.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler
 
uncaughtExc 
eption 
SEVERE: A thread (Timer-3450/103166) died unexpectedly due to an uncaught 
exception, this may leave your Jenkins in a 
bad way and is usually indicative of a bug in the code. 
java.lang.OutOfMemoryError: unable to create new native thread 
       at java.lang.Thread.start0(Native Method) 
       at java.lang.Thread.start(Thread.java:717) 
       at java.util.Timer.<init>(Timer.java:160) 
       at java.util.Timer.<init>(Timer.java:132) 
       at 
org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.scheduleRetryQueueProcessing(EventDispatcher.java:296
 

) 
       at 
org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.processRetries(EventDispatcher.java:437)
 

       at 
org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$1.run(EventDispatcher.java:299)
 

       at java.util.TimerThread.mainLoop(Timer.java:555) 
       at java.util.TimerThread.run(Timer.java:505)

Jul 17, 2019 11:58:31 AM 
hudson.init.impl.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler
 
uncaughtExc 
eption 
SEVERE: A thread (Thread-30062/98187) died unexpectedly due to an uncaught 
exception, this may leave your Jenkins in  
a bad way and is usually indicative of a bug in the code. 
java.lang.OutOfMemoryError: unable to create new native thread 
       at java.lang.Thread.start0(Native Method) 
       at java.lang.Thread.start(Thread.java:717) 
       at 
com.trilead.ssh2.transport.TransportManager.sendAsynchronousMessage(TransportManager.java:649)
 

       at 
com.trilead.ssh2.channel.ChannelManager.msgChannelRequest(ChannelManager.java:1213)
 

       at 
com.trilead.ssh2.channel.ChannelManager.handleMessage(ChannelManager.java:1466) 

       at 
com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:809)
 

       at 
com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:502) 

       at java.lang.Thread.run(Thread.java:748)


Now I have gotten catastrophic failure. I cannot relaunch any agents any 
more.

[07/17/19 12:04:10] [SSH] Opening SSH connection to 
jbssles120x64r12.spacetec.no:22.
ERROR: Unexpected error in launching a agent. This is probably a bug in Jenkins.
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:717)
        at 
com.trilead.ssh2.transport.TransportManager.initialize(TransportManager.java:545)
        at com.trilead.ssh2.Connection.connect(Connection.java:774)
        at 
hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:817)
        at hudson.plugins.sshslaves.SSHLauncher$1.call(SSHLauncher.java:419)
        at hudson.plugins.sshslaves.SSHLauncher$1.call(SSHLauncher.java:406)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
[07/17/19 12:04:10] Launch failed - cleaning up connection
[07/17/19 12:04:10] [SSH] Connection closed.


My Jenkins server has over 500 threads open
Threads: 506 total,   0 running, 506 sleeping,   0 stopped,   0 zombie


onsdag 17. juli 2019 10.24.12 UTC+2 skrev Sverre Moe følgende:
>
> We have had to blissfull days of stable Jenkins. Today two nodes are 
> disconnected and they will not come back online.
>
> What is strange is it is the same two-three nodes every time.
> Running disconnect on them through the URL 
> http://jenkins.example.com/jenkins/computer/NODE_NAME/disconnect, does 
> not work.
> I have to enter configuration, Save, then relaunch to get them up running.
>
> I tried setting the ulimit values as suggested in
>
> https://support.cloudbees.com/hc/en-us/articles/222446987-Prepare-Jenkins-for-Support#bulimitsettingsjustforlinuxos
>
> I have also added additional JVM options as suggested in
>
> https://support.cloudbees.com/hc/en-us/articles/222446987-Prepare-Jenkins-for-Support#ajavaparameters
> https://go.cloudbees.com/docs/solutions/jvm-troubleshooting/
>
> The number of threads of Jenkins server is currently 265. Yesterday when 
> all was fine this was up to 300.
>
>
> Maybe ralted or unrelated:
> When this happens we have some builds on other nodes that stops working. 
> They are aborted, but are still showing as running. The only thing that 
> works is deleting the agent and creating it again, that or restarting 
> Jenkins.
>
>
> søndag 14. juli 2019 13.31.51 UTC+2 skrev Sverre Moe følgende:
>>
>> I suspected it might be related, but was not sure. 
>>
>> The odd thing this just started being a problem a week ago. Nothing as 
>> far as I can see has changed on the Jenkins server.
>>
>> lørdag 13. juli 2019 13.04.44 UTC+2 skrev Ivan Fernandez Calvo følgende:
>>>
>>> I saw that you have another question related with OOM errors in Jenkins 
>>> if it is the same instance , this is your real issue with the agents, until 
>>> you do not have a stable Jenkins instance the agent disconnection will be a 
>>> side effect.
>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-users/9800172e-4ed4-40b3-80b1-76a26ba8591e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to