Issue Type: Bug Bug
Assignee: Vinod Kone
Components: mesos-plugin
Created: 30/Mar/15 3:27 PM
Description:

It seems when JenkinsScheduler.statusUpdate() tries to stop the Scheduler and the Retention Timer of a Slave tries to stop a Slave it can somehow end in a deadlock.

This is because the Timer locks the MesosImpl instance and statusUpdate() the SUPERVISOR_LOCK. Then MesosImpl tries to terminate the Slave and waits for the SUPERVISOR_LOCK to be freed by the statusUpdate() Thread. However, it seems that statusUpdate() needs a lock on MesosImpl too, when trying to stop the Scheduler.

This is the Threaddump (I use a slightly modified version of Mesos plugin 0.6.0, so the linenumbers are probably not 100% right):

"Thread-2516073" - Thread t@2898790
   java.lang.Thread.State: BLOCKED
    at org.jenkinsci.plugins.mesos.Mesos$MesosImpl.stopScheduler(Mesos.java:141)
    - waiting to lock <62132b60> (a org.jenkinsci.plugins.mesos.Mesos$MesosImpl) owned by "jenkins.util.Timer [#9]" t@66
    at org.jenkinsci.plugins.mesos.JenkinsScheduler.supervise(JenkinsScheduler.java:749)
    at org.jenkinsci.plugins.mesos.JenkinsScheduler.statusUpdate(JenkinsScheduler.java:634)

   Locked ownable synchronizers:
    - locked <3af5466a> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
"jenkins.util.Timer [#9]" - Thread t@66
   java.lang.Thread.State: WAITING
    at sun.misc.Unsafe.park(Native Method)
    - waiting to lock <3af5466a> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "Thread-2516073" t@2898790
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
    at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
    at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
    at org.jenkinsci.plugins.mesos.JenkinsScheduler.supervise(JenkinsScheduler.java:725)
    at org.jenkinsci.plugins.mesos.JenkinsScheduler.terminateJenkinsSlave(JenkinsScheduler.java:220)
    - locked <55398768> (a org.jenkinsci.plugins.mesos.JenkinsScheduler)
    at org.jenkinsci.plugins.mesos.Mesos$MesosImpl.stopJenkinsSlave(Mesos.java:157)
    - locked <62132b60> (a org.jenkinsci.plugins.mesos.Mesos$MesosImpl)
    at org.jenkinsci.plugins.mesos.MesosComputerLauncher.terminate(MesosComputerLauncher.java:122)
    at org.jenkinsci.plugins.mesos.MesosSlave.terminate(MesosSlave.java:91)
    at org.jenkinsci.plugins.mesos.MesosRetentionStrategy.check(MesosRetentionStrategy.java:70)
    - locked <75b63404> (a org.jenkinsci.plugins.mesos.MesosRetentionStrategy)
    at org.jenkinsci.plugins.mesos.MesosRetentionStrategy.check(MesosRetentionStrategy.java:26)
    at hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:66)
    at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:54)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:722)

   Locked ownable synchronizers:
    - locked <703c7665> (a java.util.concurrent.ThreadPoolExecutor$Worker)

I tried to solve the problem myself, but I somehow got a knot in my brain from all the synchronized calls etc. The only thing I can guess is that the multiple synchronized cross calls between MesosImpl and JenkinsScheduler are not great.

Maybe some Java whiz can solve the problem there.

PS: I posted this also on the github issues page, because it seems to be more active (https://github.com/jenkinsci/mesos-plugin/issues/97).

Environment: mesos-plugin 0.6.0 (slightly modified)
Project: Jenkins
Priority: Minor Minor
Reporter: Stefan Eder
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira

--
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to