On Wednesday, 10 December 2014, <[email protected]> wrote: > We don't provision new slaves; we have a number of existing machines in a > build farm that we connect to. Hence I don't think a cloud is a solution, > but thanks for suggesting it. > > More people have tested the plugin and have found that there is a risk for > deadlock. Anyone has an idea on how to solve this? More details below. > > We want to use Jenkins REST API as a common way to trigger and to monitor > our build jobs, so we developed the application to do that work for us. > When I set up new Jenkins instance about 1 month ago, the http server used > by Jenkins was quite unstable. Without the clear reason, when the > application wanted to queue several builds in a batch, the http server died > (both REST API calls and normal request via a web browser). It was very > repetitive issue, but without a clear reason – because it was totally new > instance with a minimum amount of plugins. > > Issue: > I have used JavaMelody plugin to monitor the Jenkins instance and I > gathered information what happened when the Jenkins was not responding. > Thread monitor said: > Warning, the following threads are deadlocked : Handling GET > /myjenkins/login from 150.132.253.135 : RequestHandlerThread[#23] > Jenkins/login.jelly Jenkins/sidepanel.jelly View/sidepanel.jelly, > jenkins.util.Timer [#4], Thread-105 > > And into full thread dump I found more information: > > Java stack information for the threads listed above: > =================================================== > "Handling GET /myjenkins/threadDump from 150.132.253.50 : > RequestHandlerThread[#25] Jenkins/threadDump.jelly Jenkins/sidepanel.jelly > View/sidepanel.jelly": > at hudson.model.Queue.getItems(Queue.java:692) > - waiting to lock <0x00000000c1e0e998> (a > hudson.model.Queue) > at hudson.model.Queue$CachedItemList.get(Queue.java:228) > […] > "jenkins.util.Timer [#3]": > at > hudson.slaves.RetentionStrategy$Demand.check(RetentionStrategy.java:212) > - waiting to lock <0x00000000c2ef23d8> (a > hudson.slaves.RetentionStrategy$Demand) > […] > "jenkins.util.Timer [#8]": > at hudson.model.Queue.getBuildableItems(Queue.java:758) > - waiting to lock <0x00000000c1e0e998> (a > hudson.model.Queue) > at > hudson.slaves.RetentionStrategy$Demand.check(RetentionStrategy.java:224) > - locked <0x00000000c2ef23d8> (a > hudson.slaves.RetentionStrategy$Demand) > at > hudson.slaves.RetentionStrategy$Demand.check(RetentionStrategy.java:172) > […] > > Found 1 deadlock. > > > Ok, so we have a deadlock caused by Jenkins instance or by one of a > jenkins plugin. Because I had a fault reproduction scenario, I have decided > to check stability in a different version of Jenkins – versions: 1.590 > (very new), 1.565 (I used it in a different Jenkins instance without any > problems). The problem still occurred. > > And, when I was checking installed plugins, I found out, that without ITTE > Queue Listener Plugin the issue have not occurred anymore (so far for a 5 > days). > > I am not familiar with the Jenkins architecture nor with the jenkins > plugin development, but for me it looks really bad, that it is possible to > cause a deadlock using REST API. I haven’t analyzed deeply how this plugins > works, but indicated part could be a deadlock reason: > > def check_computers > Java.jenkins.model.Jenkins.getInstance.getComputers.each do |c| > n = c.getNode > continue if !n.nil? && n.isHoldOffLaunchUntilSave > c.getRetentionStrategy.check(c) # <========================== > deadlock reason? > end > end > > Two full thread dumps of Jenkins when the deadlock occurs and logs from > JavaMelody are available if it would be useful for trouble shooting. >
Oh that is totally the wrong way. The whole question if synchronisation and locking around provisioning is rife with bugs. There are maybe 2 or 3 actual correct cloud implementations.., and even they have bugs when put under stress (those bugs are due to core though) Retention strategies are another area rife with bugs.... For one they fail to acquire the correct locks when checking idle, and as such you can end up with builds trying to run on a disconnected/non-existent node The provisioning strategies use incorrect stats and show strange effects for certain patterns of load. I am working on correcting the issues in core that cause some of these problems (a lot stem from the over- and miss- use of volatile to try and avoid using locks *correctly*... A certain core committer of Jenkins is a `volatile` fanboy ;-) ). I currently have two pull requests open in this area for example. > > > Any help in solving this would be appreciated! > > Regards, > > Susanne > > > On Saturday, August 2, 2014 4:40:59 AM UTC+2, Jesse Glick wrote: >> >> On Wed, Jul 23, 2014 at 6:52 AM, <[email protected]> wrote: >> > When I launch a new job while slaves are offline they take about a >> minute to start launching. >> >> I think there is just a recurrent task that runs every 60s checking >> whether a slave should be brought online. Possibly this could be >> improved (in core) to react directly to the queue addition. >> >> In general this sort of thing is handled by a Cloud [1], not an >> explicitly-configured slave. Clouds also have a delay before they are >> asked to provision new slaves, but the timing can be adjusted by >> system properties. >> >> >> [1] https://wiki.jenkins-ci.org/display/JENKINS/Extension+ >> points#Extensionpoints-hudson.slaves.Cloud >> > > -- > You received this message because you are subscribed to the Google Groups > "Jenkins Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <javascript:_e(%7B%7D,'cvml','jenkinsci-dev%[email protected]');> > . > To view this discussion on the web visit > https://groups.google.com/d/msgid/jenkinsci-dev/9c9efae5-258c-44a3-b8dd-2daddb342add%40googlegroups.com > <https://groups.google.com/d/msgid/jenkinsci-dev/9c9efae5-258c-44a3-b8dd-2daddb342add%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- Sent from my phone -- You received this message because you are subscribed to the Google Groups "Jenkins Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CA%2BnPnMzHS67ALqRr%2BSY0g-6kWp1d%2BtZHtgMbShkJco2e0F_ZBw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
