We don't provision new slaves; we have a number of existing machines in a 
build farm that we connect to. Hence I don't think a cloud is a solution, 
but thanks for suggesting it.

More people have tested the plugin and have found that there is a risk for 
deadlock. Anyone has an idea on how to solve this? More details below.

We want to use Jenkins REST API as a common way to trigger and to monitor 
our build jobs, so we developed the application to do that work for us. 
When I set up new Jenkins instance about 1 month ago, the http server used 
by Jenkins was quite unstable. Without the clear reason, when the 
application wanted to queue several builds in a batch, the http server died 
(both REST API calls and normal request via a web browser). It was very 
repetitive issue, but without a clear reason – because it was totally new 
instance with a minimum amount of plugins.

Issue:
I have used JavaMelody plugin to monitor the Jenkins instance and I 
gathered information what happened when the Jenkins was not responding. 
Thread monitor said:
Warning, the following threads are deadlocked : Handling GET 
/myjenkins/login from 150.132.253.135 : RequestHandlerThread[#23] 
Jenkins/login.jelly Jenkins/sidepanel.jelly View/sidepanel.jelly, 
jenkins.util.Timer [#4], Thread-105

And into full thread dump I found more information:

Java stack information for the threads listed above:
===================================================
"Handling GET /myjenkins/threadDump from 150.132.253.50 : 
RequestHandlerThread[#25] Jenkins/threadDump.jelly Jenkins/sidepanel.jelly 
View/sidepanel.jelly":
                at hudson.model.Queue.getItems(Queue.java:692)
                - waiting to lock <0x00000000c1e0e998> (a 
hudson.model.Queue)
                at hudson.model.Queue$CachedItemList.get(Queue.java:228)
                               […]
"jenkins.util.Timer [#3]":
                at 
hudson.slaves.RetentionStrategy$Demand.check(RetentionStrategy.java:212)
                - waiting to lock <0x00000000c2ef23d8> (a 
hudson.slaves.RetentionStrategy$Demand)
                               […]
"jenkins.util.Timer [#8]":
                at hudson.model.Queue.getBuildableItems(Queue.java:758)
                - waiting to lock <0x00000000c1e0e998> (a 
hudson.model.Queue)
                at 
hudson.slaves.RetentionStrategy$Demand.check(RetentionStrategy.java:224)
                - locked <0x00000000c2ef23d8> (a 
hudson.slaves.RetentionStrategy$Demand)
                at 
hudson.slaves.RetentionStrategy$Demand.check(RetentionStrategy.java:172)
                               […]

Found 1 deadlock.


Ok, so we have a deadlock caused by Jenkins instance or by one of a jenkins 
plugin. Because I had a fault reproduction scenario, I have decided to 
check stability in a different version of Jenkins – versions: 1.590 (very 
new), 1.565 (I used it in a different Jenkins instance without any 
problems). The problem still occurred.

And, when I was checking installed plugins, I found out, that without ITTE 
Queue Listener Plugin the issue have not occurred anymore (so far for a 5 
days).

I am not familiar with the Jenkins architecture nor with the jenkins plugin 
development, but for me it looks really bad, that it is possible to cause a 
deadlock using REST API. I haven’t analyzed deeply how this plugins works, 
but indicated part could be a deadlock reason:

def check_computers
    Java.jenkins.model.Jenkins.getInstance.getComputers.each do |c|
      n = c.getNode
      continue if !n.nil? && n.isHoldOffLaunchUntilSave
      c.getRetentionStrategy.check(c) # <========================== 
deadlock reason?
    end
  end

Two full thread dumps of Jenkins when the deadlock occurs and logs from 
JavaMelody are available if it would be useful for trouble shooting.

Any help in solving this would be appreciated!

Regards,

Susanne


On Saturday, August 2, 2014 4:40:59 AM UTC+2, Jesse Glick wrote:
>
> On Wed, Jul 23, 2014 at 6:52 AM,  <[email protected] <javascript:>> 
> wrote: 
> > When I launch a new job while slaves are offline they take about a 
> minute to start launching. 
>
> I think there is just a recurrent task that runs every 60s checking 
> whether a slave should be brought online. Possibly this could be 
> improved (in core) to react directly to the queue addition. 
>
> In general this sort of thing is handled by a Cloud [1], not an 
> explicitly-configured slave. Clouds also have a delay before they are 
> asked to provision new slaves, but the timing can be adjusted by 
> system properties. 
>
>
> [1] 
> https://wiki.jenkins-ci.org/display/JENKINS/Extension+points#Extensionpoints-hudson.slaves.Cloud
>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/9c9efae5-258c-44a3-b8dd-2daddb342add%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to