We haven't had this OutOfMemoryError now for 3 weeks running Jenkins.

We did four things.
1) Reduced master executors from 15 to 4
2) Reduced some job steps running on "master" and instead use a build agent 
for these steps. We still have one stage/step that needs to run on master.
3 Configured many of our build agents to be offline and come online on 
demand
4 Upgraded our Jenkins server: The old server was running SLES12. We Set up 
a new VM with SLES15, and copied JENKINS_HOME over to this new server.

onsdag 14. august 2019 16.17.06 UTC+2 skrev Sverre Moe følgende:
>
> I created a Pipeline job to run jstack every 10 minutes (though running on 
> Jenkins master since that is where the Jenkins is running).
>
> onsdag 14. august 2019 16.07.02 UTC+2 skrev Félix Belzunce Arcos følgende:
>>
>> Hi Sverre Moe,
>>
>> I am the person who talked to you this morning :-)
>>
>> Long term solution is to avoid building on the master to avoid 
>> performance issue and the need to increase the number of processes and open 
>> files in the machine where the jenkins master is located. Building on the 
>> master is also not recommended from a security point of view.
>>
>> Short term solution would be to increase the number of new processes on 
>> this machine + take thread dumps from the master each 10 minutes. For this, 
>> you can create a cron freestyle job executed every 10 minutes executing 
>> jstack <JENKINS_PID>. When the issue happens, you could take a look at the 
>> latest 10 builds with their thread dumps and try to figure out what is 
>> actually consuming so many threads on the master.
>>
>> I hope this helps,
>>
>>
>> El miércoles, 14 de agosto de 2019, 15:38:17 (UTC+2), Devin Nusbaum 
>> escribió:
>>>
>>> I have not read the whole thread in detail, but the “Unable to create 
>>> new native thread” OutOfMemoryErrors from your original thread where one of 
>>> the stack traces involves 
>>> org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.scheduleRetryQueueProcessing
>>>  looks 
>>> like it could be related to 
>>> https://issues.jenkins-ci.org/browse/JENKINS-58684, which is a thread 
>>> leak caused by the SSE Gateway Plugin. You could try reverting the SSE 
>>> Gateway Plugin to version 1.17 to see if that helps, although that might 
>>> reintroduce a different, somewhat rarer memory leak (
>>> https://issues.jenkins-ci.org/browse/JENKINS-51057). To test my 
>>> hypothesis, if you are running SSE Gateway Plugin version 1.19, you can 
>>> collect thread dumps over time and see if you seem to have a large number 
>>> of threads named “EventDispatcher.retryProcessor” (unfortunately in version 
>>> 1.18 and below the threads are automatically named “Timer #n”, which is 
>>> less useful), which would confirm that you are hitting JENKINS-58684 
>>> <https://issues.jenkins-ci.org/browse/JENKINS-58684>.
>>>
>>> The advice to stop building on master is definitely a good idea as well.
>>>
>>> On Aug 14, 2019, at 07:11, Sverre Moe <[email protected]> wrote:
>>>
>>> We got an 30 minute free CloudBees support. It was too short to dig 
>>> deeper to find the problem, but the person I was talking to (after 
>>> examining our logs) mentioned what he thought was the problem and gave a 
>>> suggestion.
>>>
>>> We should not use Jenkins master at all for builds (allocated with the 
>>> node("master") step). We had 15 Executors for Jenkins master.
>>>
>>> We could also try to Increase limits of hard nofile and nproc for 
>>> jenkins user, but the main recomondation was to remove all Executors for 
>>> Jenkins master.
>>> > /etc/security/limits.conf
>>> jenkins          soft    core            unlimited 
>>> jenkins          hard    core            unlimited 
>>> jenkins          soft    fsize           unlimited 
>>> jenkins          hard    fsize           unlimited 
>>> jenkins          soft    nofile          4096 
>>> jenkins          hard    nofile          10240 #Was 8192
>>> jenkins          soft    nproc           30654 
>>> jenkins          hard    nproc           60654 #Was 30654
>>>
>>>
>>> To remove Jenkins master Executors will take some time. We use Jenkins 
>>> master when we publish our build artifacts RPMs to our NFS file storage. 
>>> Since our RPM NFS is only attached to the Jenkins master it is not 
>>> possible at the moment. Unless we can use any other agent, then do a SCP 
>>> onto our Jenkins master with the RPM artifacts.
>>>
>>>
>>> We had a few other circumstances where we used Jenkins master. Like 
>>> checking out a file to determine which build agent to actually use. These I 
>>> have already changed to use any available build agent instead.
>>>
>>> tirsdag 6. august 2019 09.48.50 UTC+2 skrev Sverre Moe følgende:
>>>>
>>>> Sadly I was mistaken. We do not use NFS for JENKINS_HOME.
>>>>
>>>> We do however use NFS for the location where builds copy the RPM build 
>>>> artifacts.
>>>>
>>>> mandag 5. august 2019 22.17.46 UTC+2 skrev Ivan Fernandez Calvo 
>>>> følgende:
>>>>>
>>>>> Hi, 
>>>>>
>>>>> Severe has another email thread open, I think it is the same Jenkins 
>>>>> instance 
>>>>> https://groups.google.com/d/msgid/jenkinsci-users/cc2d0bdb-b15f-4bec-a0a3-0562ea8c7df7%40googlegroups.com?utm_medium=email&utm_source=footer.
>>>>>  
>>>>> I dunno what happens on your instance but probably it isn’t better that 
>>>>> you 
>>>>> open another email thread with the description of your issue
>>>>
>>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Jenkins Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/jenkinsci-users/3e728790-b2f5-4ae1-a9fe-512a5c912d61%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/jenkinsci-users/3e728790-b2f5-4ae1-a9fe-512a5c912d61%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-users/8f0f7910-4b2a-4d04-a887-d55a162c41f3%40googlegroups.com.

Reply via email to