I created a Pipeline job to run jstack every 10 minutes (though running on Jenkins master since that is where the Jenkins is running).
onsdag 14. august 2019 16.07.02 UTC+2 skrev Félix Belzunce Arcos følgende: > > Hi Sverre Moe, > > I am the person who talked to you this morning :-) > > Long term solution is to avoid building on the master to avoid performance > issue and the need to increase the number of processes and open files in > the machine where the jenkins master is located. Building on the master is > also not recommended from a security point of view. > > Short term solution would be to increase the number of new processes on > this machine + take thread dumps from the master each 10 minutes. For this, > you can create a cron freestyle job executed every 10 minutes executing > jstack <JENKINS_PID>. When the issue happens, you could take a look at the > latest 10 builds with their thread dumps and try to figure out what is > actually consuming so many threads on the master. > > I hope this helps, > > > El miércoles, 14 de agosto de 2019, 15:38:17 (UTC+2), Devin Nusbaum > escribió: >> >> I have not read the whole thread in detail, but the “Unable to create new >> native thread” OutOfMemoryErrors from your original thread where one of the >> stack traces involves >> org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.scheduleRetryQueueProcessing >> looks >> like it could be related to >> https://issues.jenkins-ci.org/browse/JENKINS-58684, which is a thread >> leak caused by the SSE Gateway Plugin. You could try reverting the SSE >> Gateway Plugin to version 1.17 to see if that helps, although that might >> reintroduce a different, somewhat rarer memory leak ( >> https://issues.jenkins-ci.org/browse/JENKINS-51057). To test my >> hypothesis, if you are running SSE Gateway Plugin version 1.19, you can >> collect thread dumps over time and see if you seem to have a large number >> of threads named “EventDispatcher.retryProcessor” (unfortunately in version >> 1.18 and below the threads are automatically named “Timer #n”, which is >> less useful), which would confirm that you are hitting JENKINS-58684 >> <https://issues.jenkins-ci.org/browse/JENKINS-58684>. >> >> The advice to stop building on master is definitely a good idea as well. >> >> On Aug 14, 2019, at 07:11, Sverre Moe <[email protected]> wrote: >> >> We got an 30 minute free CloudBees support. It was too short to dig >> deeper to find the problem, but the person I was talking to (after >> examining our logs) mentioned what he thought was the problem and gave a >> suggestion. >> >> We should not use Jenkins master at all for builds (allocated with the >> node("master") step). We had 15 Executors for Jenkins master. >> >> We could also try to Increase limits of hard nofile and nproc for jenkins >> user, but the main recomondation was to remove all Executors for Jenkins >> master. >> > /etc/security/limits.conf >> jenkins soft core unlimited >> jenkins hard core unlimited >> jenkins soft fsize unlimited >> jenkins hard fsize unlimited >> jenkins soft nofile 4096 >> jenkins hard nofile 10240 #Was 8192 >> jenkins soft nproc 30654 >> jenkins hard nproc 60654 #Was 30654 >> >> >> To remove Jenkins master Executors will take some time. We use Jenkins >> master when we publish our build artifacts RPMs to our NFS file storage. >> Since our RPM NFS is only attached to the Jenkins master it is not >> possible at the moment. Unless we can use any other agent, then do a SCP >> onto our Jenkins master with the RPM artifacts. >> >> >> We had a few other circumstances where we used Jenkins master. Like >> checking out a file to determine which build agent to actually use. These I >> have already changed to use any available build agent instead. >> >> tirsdag 6. august 2019 09.48.50 UTC+2 skrev Sverre Moe følgende: >>> >>> Sadly I was mistaken. We do not use NFS for JENKINS_HOME. >>> >>> We do however use NFS for the location where builds copy the RPM build >>> artifacts. >>> >>> mandag 5. august 2019 22.17.46 UTC+2 skrev Ivan Fernandez Calvo følgende: >>>> >>>> Hi, >>>> >>>> Severe has another email thread open, I think it is the same Jenkins >>>> instance >>>> https://groups.google.com/d/msgid/jenkinsci-users/cc2d0bdb-b15f-4bec-a0a3-0562ea8c7df7%40googlegroups.com?utm_medium=email&utm_source=footer. >>>> >>>> I dunno what happens on your instance but probably it isn’t better that >>>> you >>>> open another email thread with the description of your issue >>> >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "Jenkins Users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/jenkinsci-users/3e728790-b2f5-4ae1-a9fe-512a5c912d61%40googlegroups.com >> >> <https://groups.google.com/d/msgid/jenkinsci-users/3e728790-b2f5-4ae1-a9fe-512a5c912d61%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> >> -- You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/98998ef9-0ce7-434a-94c7-b8f29c30962c%40googlegroups.com.
