We don't provision new slaves; we have a number of existing machines in a
build farm that we connect to. Hence I don't think a cloud is a solution,
but thanks for suggesting it.
More people have tested the plugin and have found that there is a risk for
deadlock. Anyone has an idea on how to solve this? More details below.
We want to use Jenkins REST API as a common way to trigger and to monitor
our build jobs, so we developed the application to do that work for us.
When I set up new Jenkins instance about 1 month ago, the http server used
by Jenkins was quite unstable. Without the clear reason, when the
application wanted to queue several builds in a batch, the http server died
(both REST API calls and normal request via a web browser). It was very
repetitive issue, but without a clear reason – because it was totally new
instance with a minimum amount of plugins.
Issue:
I have used JavaMelody plugin to monitor the Jenkins instance and I
gathered information what happened when the Jenkins was not responding.
Thread monitor said:
Warning, the following threads are deadlocked : Handling GET
/myjenkins/login from 150.132.253.135 : RequestHandlerThread[#23]
Jenkins/login.jelly Jenkins/sidepanel.jelly View/sidepanel.jelly,
jenkins.util.Timer [#4], Thread-105
And into full thread dump I found more information:
Java stack information for the threads listed above:
===================================================
"Handling GET /myjenkins/threadDump from 150.132.253.50 :
RequestHandlerThread[#25] Jenkins/threadDump.jelly Jenkins/sidepanel.jelly
View/sidepanel.jelly":
at hudson.model.Queue.getItems(Queue.java:692)
- waiting to lock <0x00000000c1e0e998> (a
hudson.model.Queue)
at hudson.model.Queue$CachedItemList.get(Queue.java:228)
[…]
"jenkins.util.Timer [#3]":
at
hudson.slaves.RetentionStrategy$Demand.check(RetentionStrategy.java:212)
- waiting to lock <0x00000000c2ef23d8> (a
hudson.slaves.RetentionStrategy$Demand)
[…]
"jenkins.util.Timer [#8]":
at hudson.model.Queue.getBuildableItems(Queue.java:758)
- waiting to lock <0x00000000c1e0e998> (a
hudson.model.Queue)
at
hudson.slaves.RetentionStrategy$Demand.check(RetentionStrategy.java:224)
- locked <0x00000000c2ef23d8> (a
hudson.slaves.RetentionStrategy$Demand)
at
hudson.slaves.RetentionStrategy$Demand.check(RetentionStrategy.java:172)
[…]
Found 1 deadlock.
Ok, so we have a deadlock caused by Jenkins instance or by one of a jenkins
plugin. Because I had a fault reproduction scenario, I have decided to
check stability in a different version of Jenkins – versions: 1.590 (very
new), 1.565 (I used it in a different Jenkins instance without any
problems). The problem still occurred.
And, when I was checking installed plugins, I found out, that without ITTE
Queue Listener Plugin the issue have not occurred anymore (so far for a 5
days).
I am not familiar with the Jenkins architecture nor with the jenkins plugin
development, but for me it looks really bad, that it is possible to cause a
deadlock using REST API. I haven’t analyzed deeply how this plugins works,
but indicated part could be a deadlock reason:
def check_computers
Java.jenkins.model.Jenkins.getInstance.getComputers.each do |c|
n = c.getNode
continue if !n.nil? && n.isHoldOffLaunchUntilSave
c.getRetentionStrategy.check(c) # <==========================
deadlock reason?
end
end
Two full thread dumps of Jenkins when the deadlock occurs and logs from
JavaMelody are available if it would be useful for trouble shooting.
Any help in solving this would be appreciated!
Regards,
Susanne
On Saturday, August 2, 2014 4:40:59 AM UTC+2, Jesse Glick wrote:
>
> On Wed, Jul 23, 2014 at 6:52 AM, <[email protected] <javascript:>>
> wrote:
> > When I launch a new job while slaves are offline they take about a
> minute to start launching.
>
> I think there is just a recurrent task that runs every 60s checking
> whether a slave should be brought online. Possibly this could be
> improved (in core) to react directly to the queue addition.
>
> In general this sort of thing is handled by a Cloud [1], not an
> explicitly-configured slave. Clouds also have a delay before they are
> asked to provision new slaves, but the timing can be adjusted by
> system properties.
>
>
> [1]
> https://wiki.jenkins-ci.org/display/JENKINS/Extension+points#Extensionpoints-hudson.slaves.Cloud
>
>
--
You received this message because you are subscribed to the Google Groups
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/jenkinsci-dev/9c9efae5-258c-44a3-b8dd-2daddb342add%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.