Hi all,

I'd like help getting to the bottom of the unexpected termination of a
Brooklyn process. It hit me twice in a row yesterday, once with nothing
weird in the logs and once with a number of stacktraces indicating an
InterruptedException was thrown. Both processes were run on the same host
in Softlayer and were running Clocker.

The first deployment seemed to be working normally. I had deployed a few
applications to my-docker-cloud and stopped them again. A moment later and
the process had stopped. The last thing the server did was check the status
of a Weave container:

2015-01-28 07:16:15,002 DEBUG brooklyn.SSH
[brooklyn-execmanager-BL31ZSeZ-2001]: check-running
WeaveContainerImpl{id=r3fpQokV}, on machine
SshMachineLocation[159.8.36.8:159.8.36.8/159.8.36.8:22@DKRM1V05],
completed: return status 0
2015-01-28 07:16:15,371 DEBUG b.launcher.BrooklynWebServer
[shutdownHookThread]: BrooklynWebServer detected shutdown: stopping
web-console

There were no interesting exceptions in the debug log.

In the second case the process stopped as Brooklyn waited for the status of
a service that did not provision. This time there was a (lot of) stacktrace
in the logs. Most pertinently perhaps was:

2015-01-28 09:07:49,891 DEBUG b.u.task.BasicExecutionManager
[brooklyn-execmanager-YRXvc51z-1676]: Exception running task
Task[post-start:ihocjzth] (rethrowing): java.lang.InterruptedException:
sleep interrupted
brooklyn.util.exceptions.RuntimeInterruptedException:
java.lang.InterruptedException: sleep interrupted
at brooklyn.util.exceptions.Exceptions.propagate(Exceptions.java:89)
~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at brooklyn.util.time.Time.sleep(Time.java:312)
~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at brooklyn.util.time.Time.sleep(Time.java:318)
~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at brooklyn.util.repeat.Repeater.runKeepingError(Repeater.java:382)
~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at brooklyn.util.repeat.Repeater.run(Repeater.java:305)
~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at brooklyn.entity.basic.Entities.waitForServiceUp(Entities.java:1028)
~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at
brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:370)
~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at
brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:367)
~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at
brooklyn.entity.basic.SoftwareProcessDriverLifecycleEffectorTasks.postStartCustom(SoftwareProcessDriverLifecycleEffectorTasks.java:160)
~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at
brooklyn.entity.software.MachineLifecycleEffectorTasks$7.run(MachineLifecycleEffectorTasks.java:431)
~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
~[na:1.7.0_65]
at
brooklyn.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:337)
~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at
brooklyn.util.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:469)
~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_65]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_65]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_65]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
Caused by: java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method) [na:1.7.0_65]
at brooklyn.util.time.Time.sleep(Time.java:310)
~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
... 15 common frames omitted

I've got full logs for each run of the server but I didn't get the exit
code of either process. I ran a third test on the same host later in the
day and nothing went wrong (in a reasonable timeframe).

Has anyone experienced this before? Are there any system logs I could have
looked to for more information? A brief look at the standard /var/log files
revealed nothing.

It was a bit alarming to see that in the first instance the process stopped
with no indication why.

Sam

Reply via email to