We haven't had this OutOfMemoryError now for 3 weeks running Jenkins. We did four things. 1) Reduced master executors from 15 to 4 2) Reduced some job steps running on "master" and instead use a build agent for these steps. We still have one stage/step that needs to run on master. 3 Configured many of our build agents to be offline and come online on demand 4 Upgraded our Jenkins server: The old server was running SLES12. We Set up a new VM with SLES15, and copied JENKINS_HOME over to this new server.
onsdag 14. august 2019 16.17.06 UTC+2 skrev Sverre Moe følgende: > > I created a Pipeline job to run jstack every 10 minutes (though running on > Jenkins master since that is where the Jenkins is running). > > onsdag 14. august 2019 16.07.02 UTC+2 skrev Félix Belzunce Arcos følgende: >> >> Hi Sverre Moe, >> >> I am the person who talked to you this morning :-) >> >> Long term solution is to avoid building on the master to avoid >> performance issue and the need to increase the number of processes and open >> files in the machine where the jenkins master is located. Building on the >> master is also not recommended from a security point of view. >> >> Short term solution would be to increase the number of new processes on >> this machine + take thread dumps from the master each 10 minutes. For this, >> you can create a cron freestyle job executed every 10 minutes executing >> jstack <JENKINS_PID>. When the issue happens, you could take a look at the >> latest 10 builds with their thread dumps and try to figure out what is >> actually consuming so many threads on the master. >> >> I hope this helps, >> >> >> El miércoles, 14 de agosto de 2019, 15:38:17 (UTC+2), Devin Nusbaum >> escribió: >>> >>> I have not read the whole thread in detail, but the “Unable to create >>> new native thread” OutOfMemoryErrors from your original thread where one of >>> the stack traces involves >>> org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.scheduleRetryQueueProcessing >>> looks >>> like it could be related to >>> https://issues.jenkins-ci.org/browse/JENKINS-58684, which is a thread >>> leak caused by the SSE Gateway Plugin. You could try reverting the SSE >>> Gateway Plugin to version 1.17 to see if that helps, although that might >>> reintroduce a different, somewhat rarer memory leak ( >>> https://issues.jenkins-ci.org/browse/JENKINS-51057). To test my >>> hypothesis, if you are running SSE Gateway Plugin version 1.19, you can >>> collect thread dumps over time and see if you seem to have a large number >>> of threads named “EventDispatcher.retryProcessor” (unfortunately in version >>> 1.18 and below the threads are automatically named “Timer #n”, which is >>> less useful), which would confirm that you are hitting JENKINS-58684 >>> <https://issues.jenkins-ci.org/browse/JENKINS-58684>. >>> >>> The advice to stop building on master is definitely a good idea as well. >>> >>> On Aug 14, 2019, at 07:11, Sverre Moe <sver...@gmail.com> wrote: >>> >>> We got an 30 minute free CloudBees support. It was too short to dig >>> deeper to find the problem, but the person I was talking to (after >>> examining our logs) mentioned what he thought was the problem and gave a >>> suggestion. >>> >>> We should not use Jenkins master at all for builds (allocated with the >>> node("master") step). We had 15 Executors for Jenkins master. >>> >>> We could also try to Increase limits of hard nofile and nproc for >>> jenkins user, but the main recomondation was to remove all Executors for >>> Jenkins master. >>> > /etc/security/limits.conf >>> jenkins soft core unlimited >>> jenkins hard core unlimited >>> jenkins soft fsize unlimited >>> jenkins hard fsize unlimited >>> jenkins soft nofile 4096 >>> jenkins hard nofile 10240 #Was 8192 >>> jenkins soft nproc 30654 >>> jenkins hard nproc 60654 #Was 30654 >>> >>> >>> To remove Jenkins master Executors will take some time. We use Jenkins >>> master when we publish our build artifacts RPMs to our NFS file storage. >>> Since our RPM NFS is only attached to the Jenkins master it is not >>> possible at the moment. Unless we can use any other agent, then do a SCP >>> onto our Jenkins master with the RPM artifacts. >>> >>> >>> We had a few other circumstances where we used Jenkins master. Like >>> checking out a file to determine which build agent to actually use. These I >>> have already changed to use any available build agent instead. >>> >>> tirsdag 6. august 2019 09.48.50 UTC+2 skrev Sverre Moe følgende: >>>> >>>> Sadly I was mistaken. We do not use NFS for JENKINS_HOME. >>>> >>>> We do however use NFS for the location where builds copy the RPM build >>>> artifacts. >>>> >>>> mandag 5. august 2019 22.17.46 UTC+2 skrev Ivan Fernandez Calvo >>>> følgende: >>>>> >>>>> Hi, >>>>> >>>>> Severe has another email thread open, I think it is the same Jenkins >>>>> instance >>>>> https://groups.google.com/d/msgid/jenkinsci-users/cc2d0bdb-b15f-4bec-a0a3-0562ea8c7df7%40googlegroups.com?utm_medium=email&utm_source=footer. >>>>> >>>>> I dunno what happens on your instance but probably it isn’t better that >>>>> you >>>>> open another email thread with the description of your issue >>>> >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Jenkins Users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to jenkins...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/jenkinsci-users/3e728790-b2f5-4ae1-a9fe-512a5c912d61%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/jenkinsci-users/3e728790-b2f5-4ae1-a9fe-512a5c912d61%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> >>> -- You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/8f0f7910-4b2a-4d04-a887-d55a162c41f3%40googlegroups.com.