I created a Pipeline job to run jstack every 10 minutes (though running on 
Jenkins master since that is where the Jenkins is running).

onsdag 14. august 2019 16.07.02 UTC+2 skrev Félix Belzunce Arcos følgende:
>
> Hi Sverre Moe,
>
> I am the person who talked to you this morning :-)
>
> Long term solution is to avoid building on the master to avoid performance 
> issue and the need to increase the number of processes and open files in 
> the machine where the jenkins master is located. Building on the master is 
> also not recommended from a security point of view.
>
> Short term solution would be to increase the number of new processes on 
> this machine + take thread dumps from the master each 10 minutes. For this, 
> you can create a cron freestyle job executed every 10 minutes executing 
> jstack <JENKINS_PID>. When the issue happens, you could take a look at the 
> latest 10 builds with their thread dumps and try to figure out what is 
> actually consuming so many threads on the master.
>
> I hope this helps,
>
>
> El miércoles, 14 de agosto de 2019, 15:38:17 (UTC+2), Devin Nusbaum 
> escribió:
>>
>> I have not read the whole thread in detail, but the “Unable to create new 
>> native thread” OutOfMemoryErrors from your original thread where one of the 
>> stack traces involves 
>> org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.scheduleRetryQueueProcessing
>>  looks 
>> like it could be related to 
>> https://issues.jenkins-ci.org/browse/JENKINS-58684, which is a thread 
>> leak caused by the SSE Gateway Plugin. You could try reverting the SSE 
>> Gateway Plugin to version 1.17 to see if that helps, although that might 
>> reintroduce a different, somewhat rarer memory leak (
>> https://issues.jenkins-ci.org/browse/JENKINS-51057). To test my 
>> hypothesis, if you are running SSE Gateway Plugin version 1.19, you can 
>> collect thread dumps over time and see if you seem to have a large number 
>> of threads named “EventDispatcher.retryProcessor” (unfortunately in version 
>> 1.18 and below the threads are automatically named “Timer #n”, which is 
>> less useful), which would confirm that you are hitting JENKINS-58684 
>> <https://issues.jenkins-ci.org/browse/JENKINS-58684>.
>>
>> The advice to stop building on master is definitely a good idea as well.
>>
>> On Aug 14, 2019, at 07:11, Sverre Moe <[email protected]> wrote:
>>
>> We got an 30 minute free CloudBees support. It was too short to dig 
>> deeper to find the problem, but the person I was talking to (after 
>> examining our logs) mentioned what he thought was the problem and gave a 
>> suggestion.
>>
>> We should not use Jenkins master at all for builds (allocated with the 
>> node("master") step). We had 15 Executors for Jenkins master.
>>
>> We could also try to Increase limits of hard nofile and nproc for jenkins 
>> user, but the main recomondation was to remove all Executors for Jenkins 
>> master.
>> > /etc/security/limits.conf
>> jenkins          soft    core            unlimited 
>> jenkins          hard    core            unlimited 
>> jenkins          soft    fsize           unlimited 
>> jenkins          hard    fsize           unlimited 
>> jenkins          soft    nofile          4096 
>> jenkins          hard    nofile          10240 #Was 8192
>> jenkins          soft    nproc           30654 
>> jenkins          hard    nproc           60654 #Was 30654
>>
>>
>> To remove Jenkins master Executors will take some time. We use Jenkins 
>> master when we publish our build artifacts RPMs to our NFS file storage. 
>> Since our RPM NFS is only attached to the Jenkins master it is not 
>> possible at the moment. Unless we can use any other agent, then do a SCP 
>> onto our Jenkins master with the RPM artifacts.
>>
>>
>> We had a few other circumstances where we used Jenkins master. Like 
>> checking out a file to determine which build agent to actually use. These I 
>> have already changed to use any available build agent instead.
>>
>> tirsdag 6. august 2019 09.48.50 UTC+2 skrev Sverre Moe følgende:
>>>
>>> Sadly I was mistaken. We do not use NFS for JENKINS_HOME.
>>>
>>> We do however use NFS for the location where builds copy the RPM build 
>>> artifacts.
>>>
>>> mandag 5. august 2019 22.17.46 UTC+2 skrev Ivan Fernandez Calvo følgende:
>>>>
>>>> Hi, 
>>>>
>>>> Severe has another email thread open, I think it is the same Jenkins 
>>>> instance 
>>>> https://groups.google.com/d/msgid/jenkinsci-users/cc2d0bdb-b15f-4bec-a0a3-0562ea8c7df7%40googlegroups.com?utm_medium=email&utm_source=footer.
>>>>  
>>>> I dunno what happens on your instance but probably it isn’t better that 
>>>> you 
>>>> open another email thread with the description of your issue
>>>
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Jenkins Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/jenkinsci-users/3e728790-b2f5-4ae1-a9fe-512a5c912d61%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/jenkinsci-users/3e728790-b2f5-4ae1-a9fe-512a5c912d61%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-users/98998ef9-0ce7-434a-94c7-b8f29c30962c%40googlegroups.com.

Reply via email to