Change By: Oliver Gondža (03/Dec/14 9:38 AM)
Description: We observe a deadlock in JClouds plugin that happens in a following scenarion:

1. There is a thread (thread A) that performs deployment of a new slave
2. Another thread (thread B) is working at the same time and decides take down one of the existing nodes
3. Thread A performs Hudson.getInstance().addNode() here: https://github.com/jenkinsci/jclouds-plugin/blob/master/jclouds-plugin/src/main/java/jenkins/plugins/jclouds/compute/JCloudsCloud.java#L205. At th
4. Thread B performs setTemporaryOffline on a computer of the slave it wants to take down here: https://github.com/jenkinsci/jclouds-plugin/blob/master/jclouds-plugin/src/main/java/jenkins/plugins/jclouds/compute/JCloudsRetentionStrategy.java#L30 . At that moment it already holds the lock for the JCloudsRetentionStarategy instance because it is synchronized
5. addNode is synchronized on Hudson instance and locks it
6. addNode subsequently calls hudson.model.AbstractCIBase.updateComputerList, which calls retentionStrategy.check on a node being deleted in thread B and locks on it
7. setTemporarilyOffline calls Jenkins.save that locks on the Jenkins (i.e. Hudson) instance, which is already locked by thread A in step 5.

The stack trace of thead A is therefore as follows:
===================================================

{noformat}
Name: Computer.threadPoolForRemoting [#135]
State: BLOCKED on jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy@4ccee114 owned by: jenkins.util.Timer [#8]
Total blocked: 8  Total waited: 1

Stack trace: 
jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:22)
jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:15)
hudson.slaves.SlaveComputer.setNode(SlaveComputer.java:663)
hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:120)
hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:180)
   - locked java.lang.Object@5a4034b5
jenkins.model.Jenkins.updateComputerList(Jenkins.java:1214)
jenkins.model.Jenkins.setNodes(Jenkins.java:1711)
jenkins.model.Jenkins.addNode(Jenkins.java:1693)
   - locked hudson.model.Hudson@32da12ad
jenkins.plugins.jclouds.compute.JCloudsCloud$2.call(JCloudsCloud.java:205)
jenkins.plugins.jclouds.compute.JCloudsCloud$2.call(JCloudsCloud.java:201)
java.util.concurrent.FutureTask.run(FutureTask.java:262)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:744)
{noformat}

The stack trace of thread B is as follows:
==========================================

{noformat}
Name: jenkins.util.Timer [#8]
State: BLOCKED on hudson.model.Hudson@32da12ad owned by: Computer.threadPoolForRemoting [#135]
Total blocked: 2  Total waited: 274

Stack trace: 
jenkins.model.Jenkins.save(Jenkins.java:2672)
hudson.model.Node.setTemporaryOfflineCause(Node.java:221)
hudson.model.Computer.setTemporarilyOffline(Computer.java:590)
jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:30)
   - locked jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy@4ccee114
jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:15)
hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:66)
hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:54)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:744)
{noformat}
=====================================

This actually causes deadly problems during normal operation of JClouds under somewhat high load.
Component/s: jclouds-jenkins
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira

--
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to