We haven't had this OutOfMemoryError now for 3 weeks running Jenkins. We did four things. 1) Reduced master executors from 15 to 4 2) Reduced some job steps running on "master" and instead use a build agent for these steps. We still have one stage/step that needs to run on master. 3 Configured many of our build agents to be offline and come online on demand 4 Upgraded our Jenkins server: The old server was running SLES12. We Set up a new VM with SLES15, and copied JENKINS_HOME over to this new server.
onsdag 14. august 2019 16.17.06 UTC+2 skrev Sverre Moe følgende: > > I created a Pipeline job to run jstack every 10 minutes (though running on > Jenkins master since that is where the Jenkins is running). > > onsdag 14. august 2019 16.07.02 UTC+2 skrev Félix Belzunce Arcos følgende: >> >> Hi Sverre Moe, >> >> I am the person who talked to you this morning :-) >> >> Long term solution is to avoid building on the master to avoid >> performance issue and the need to increase the number of processes and open >> files in the machine where the jenkins master is located. Building on the >> master is also not recommended from a security point of view. >> >> Short term solution would be to increase the number of new processes on >> this machine + take thread dumps from the master each 10 minutes. For this, >> you can create a cron freestyle job executed every 10 minutes executing >> jstack <JENKINS_PID>. When the issue happens, you could take a look at the >> latest 10 builds with their thread dumps and try to figure out what is >> actually consuming so many threads on the master. >> >> I hope this helps, >> >> >> El miércoles, 14 de agosto de 2019, 15:38:17 (UTC+2), Devin Nusbaum >> escribió: >>> >>> I have not read the whole thread in detail, but the “Unable to create >>> new native thread” OutOfMemoryErrors from your original thread where one of >>> the stack traces involves >>> org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.scheduleRetryQueueProcessing >>> looks >>> like it could be related to >>> https://issues.jenkins-ci.org/browse/JENKINS-58684, which is a thread >>> leak caused by the SSE Gateway Plugin. You could try reverting the SSE >>> Gateway Plugin to version 1.17 to see if that helps, although that might >>> reintroduce a different, somewhat rarer memory leak ( >>> https://issues.jenkins-ci.org/browse/JENKINS-51057). To test my >>> hypothesis, if you are running SSE Gateway Plugin version 1.19, you can >>> collect thread dumps over time and see if you seem to have a large number >>> of threads named “EventDispatcher.retryProcessor” (unfortunately in version >>> 1.18 and below the threads are automatically named “Timer #n”, which is >>> less useful), which would confirm that you are hitting JENKINS-58684 >>> <https://issues.jenkins-ci.org/browse/JENKINS-58684>. >>> >>> The advice to stop building on master is definitely a good idea as well. >>> >>> On Aug 14, 2019, at 07:11, Sverre Moe <[email protected]> wrote: >>> >>> We got an 30 minute free CloudBees support. It was too short to dig >>> deeper to find the problem, but the person I was talking to (after >>> examining our logs) mentioned what he thought was the problem and gave a >>> suggestion. >>> >>> We should not use Jenkins master at all for builds (allocated with the >>> node("master") step). We had 15 Executors for Jenkins master. >>> >>> We could also try to Increase limits of hard nofile and nproc for >>> jenkins user, but the main recomondation was to remove all Executors for >>> Jenkins master. >>> > /etc/security/limits.conf >>> jenkins soft core unlimited >>> jenkins hard core unlimited >>> jenkins soft fsize unlimited >>> jenkins hard fsize unlimited >>> jenkins soft nofile 4096 >>> jenkins hard nofile 10240 #Was 8192 >>> jenkins soft nproc 30654 >>> jenkins hard nproc 60654 #Was 30654 >>> >>> >>> To remove Jenkins master Executors will take some time. We use Jenkins >>> master when we publish our build artifacts RPMs to our NFS file storage. >>> Since our RPM NFS is only attached to the Jenkins master it is not >>> possible at the moment. Unless we can use any other agent, then do a SCP >>> onto our Jenkins master with the RPM artifacts. >>> >>> >>> We had a few other circumstances where we used Jenkins master. Like >>> checking out a file to determine which build agent to actually use. These I >>> have already changed to use any available build agent instead. >>> >>> tirsdag 6. august 2019 09.48.50 UTC+2 skrev Sverre Moe følgende: >>>> >>>> Sadly I was mistaken. We do not use NFS for JENKINS_HOME. >>>> >>>> We do however use NFS for the location where builds copy the RPM build >>>> artifacts. >>>> >>>> mandag 5. august 2019 22.17.46 UTC+2 skrev Ivan Fernandez Calvo >>>> følgende: >>>>> >>>>> Hi, >>>>> >>>>> Severe has another email thread open, I think it is the same Jenkins >>>>> instance >>>>> https://groups.google.com/d/msgid/jenkinsci-users/cc2d0bdb-b15f-4bec-a0a3-0562ea8c7df7%40googlegroups.com?utm_medium=email&utm_source=footer. >>>>> >>>>> I dunno what happens on your instance but probably it isn’t better that >>>>> you >>>>> open another email thread with the description of your issue >>>> >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Jenkins Users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/jenkinsci-users/3e728790-b2f5-4ae1-a9fe-512a5c912d61%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/jenkinsci-users/3e728790-b2f5-4ae1-a9fe-512a5c912d61%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> >>> -- You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/8f0f7910-4b2a-4d04-a887-d55a162c41f3%40googlegroups.com.
