Sam Corbett created BROOKLYN-332:
------------------------------------
Summary: Blocked task holding mutex lives beyond application
lifetime and blocks tasks in subsequent applications
Key: BROOKLYN-332
URL: https://issues.apache.org/jira/browse/BROOKLYN-332
Project: Brooklyn
Issue Type: Bug
Reporter: Sam Corbett
Andrea deployed a VanillaJavaApp that got stuck starting on a task that was
never going to complete. The task that was stuck had obtained a mutex on an
SshMachineLocation in ArchiveUtils.deploy. The application was stopped but the
task was not stopped and the mutex was never released. This stacktrace is from
a thread dump after stopping the app:
{code}
brooklyn-execmanager-EiOrzrfj-9" #54 daemon prio=5 os_prio=31
tid=0x00007fa7f191a800 nid=0x8e03 in Object.wait() [0x0000700003151000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at
org.apache.brooklyn.util.core.task.BasicTask.blockUntilStarted(BasicTask.java:389)
- locked <0x0000000784e96ac8> (a
org.apache.brooklyn.util.core.task.BasicTask)
at
org.apache.brooklyn.util.core.task.BasicTask.blockUntilStarted(BasicTask.java:378)
- locked <0x0000000784e96ac8> (a
org.apache.brooklyn.util.core.task.BasicTask)
at org.apache.brooklyn.util.core.task.BasicTask.get(BasicTask.java:360)
at
org.apache.brooklyn.util.core.task.BasicTask.getUnchecked(BasicTask.java:370)
at
org.apache.brooklyn.util.core.task.system.ProcessTaskWrapper.get(ProcessTaskWrapper.java:153)
at
org.apache.brooklyn.util.core.file.ArchiveUtils.deploy(ArchiveUtils.java:277)
at
org.apache.brooklyn.util.core.file.ArchiveUtils.deploy(ArchiveUtils.java:237)
at
org.apache.brooklyn.entity.java.VanillaJavaAppSshDriver.customize(VanillaJavaAppSshDriver.java:99)
at
org.apache.brooklyn.entity.software.base.AbstractSoftwareProcessDriver$3$2.run(AbstractSoftwareProcessDriver.java:175)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at
org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:359)
at
org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:519)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
The app was redeployed to the same location (Andrea to clarify whether it was
localhost or BYON). ArchiveUtils' attempt to obtain the machine mutex failed
because the mutex was still owned by the zombie task:
{code}
brooklyn-execmanager-EiOrzrfj-0" #45 daemon prio=5 os_prio=31
tid=0x00007fa7f0fb3800 nid=0x7c03 waiting on condition [0x0000700002835000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000007838ff1d8> (a
java.util.concurrent.Semaphore$FairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
at
org.apache.brooklyn.util.core.mutex.SemaphoreWithOwners.acquire(SemaphoreWithOwners.java:51)
at
org.apache.brooklyn.util.core.mutex.MutexSupport.acquireMutex(MutexSupport.java:77)
at
org.apache.brooklyn.location.ssh.SshMachineLocation.acquireMutex(SshMachineLocation.java:1078)
at
org.apache.brooklyn.util.core.file.ArchiveUtils.deploy(ArchiveUtils.java:266)
at
org.apache.brooklyn.util.core.file.ArchiveUtils.deploy(ArchiveUtils.java:237)
at
org.apache.brooklyn.entity.java.VanillaJavaAppSshDriver.customize(VanillaJavaAppSshDriver.java:99)
at
org.apache.brooklyn.entity.software.base.AbstractSoftwareProcessDriver$3$2.run(AbstractSoftwareProcessDriver.java:175)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at
org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:359)
at
org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:519)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
This manifested itself as an app that was forever "installing archive" and
could only really be understood with a thread dump.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)