Issue Type: Bug Bug
Affects Versions: current
Assignee: abayer
Components: jclouds, jclouds-jenkins
Created: 09/Apr/14 1:17 PM
Description:

We observe a deadlock in JClouds plugin that happens in a following scenarion:

1. There is a thread (thread A) that performs deployment of a new slave
2. Another thread (thread B) is working at the same time and decides take down one of the existing nodes
3. Thread A performs Hudson.getInstance().addNode() here: https://github.com/jenkinsci/jclouds-plugin/blob/master/jclouds-plugin/src/main/java/jenkins/plugins/jclouds/compute/JCloudsCloud.java#L205. At th
4. Thread B performs setTemporaryOffline on a computer of the slave it wants to take down here: https://github.com/jenkinsci/jclouds-plugin/blob/master/jclouds-plugin/src/main/java/jenkins/plugins/jclouds/compute/JCloudsRetentionStrategy.java#L30 . At that moment it already holds the lock for the JCloudsRetentionStarategy instance because it is synchronized
5. addNode is synchronized on Hudson instance and locks it
6. addNode subsequently calls hudson.model.AbstractCIBase.updateComputerList, which calls retentionStrategy.check on a node being deleted in thread B and locks on it
7. setTemporarilyOffline calls Jenkins.save that locks on the Jenkins (i.e. Hudson) instance, which is already locked by thread A in step 5.

The stack trace of thead A is therefore as follows:
===================================================

Name: Computer.threadPoolForRemoting 135
State: BLOCKED on jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy@4ccee114 owned by: jenkins.util.Timer 8
Total blocked: 8 Total waited: 1

Stack trace:
jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:22)
jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:15)
hudson.slaves.SlaveComputer.setNode(SlaveComputer.java:663)
hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:120)
hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:180)

  • locked java.lang.Object@5a4034b5
    jenkins.model.Jenkins.updateComputerList(Jenkins.java:1214)
    jenkins.model.Jenkins.setNodes(Jenkins.java:1711)
    jenkins.model.Jenkins.addNode(Jenkins.java:1693)
  • locked hudson.model.Hudson@32da12ad
    jenkins.plugins.jclouds.compute.JCloudsCloud$2.call(JCloudsCloud.java:205)
    jenkins.plugins.jclouds.compute.JCloudsCloud$2.call(JCloudsCloud.java:201)
    java.util.concurrent.FutureTask.run(FutureTask.java:262)
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    java.lang.Thread.run(Thread.java:744)

The stack trace of thread B is as follows:
==========================================

Name: jenkins.util.Timer 8
State: BLOCKED on hudson.model.Hudson@32da12ad owned by: Computer.threadPoolForRemoting 135
Total blocked: 2 Total waited: 274

Stack trace:
jenkins.model.Jenkins.save(Jenkins.java:2672)
hudson.model.Node.setTemporaryOfflineCause(Node.java:221)
hudson.model.Computer.setTemporarilyOffline(Computer.java:590)
jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:30)

  • locked jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy@4ccee114
    jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:15)
    hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:66)
    hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:54)
    java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
    java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
    java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    java.lang.Thread.run(Thread.java:744)

=====================================

This actually causes deadly problems during normal operation of JClouds under somewhat high load.

Environment: jclouds working with an openstack
Project: Jenkins
Labels: plugin deadlock
Priority: Blocker Blocker
Reporter: Ivan Kalinin
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira

--
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to