Further questions:

Where were you running this process - on your workstation, or on a
Linux server elsewhere? What is the spec/OS of the machine running
Brooklyn? How much "stuff" was going on in Brooklyn? (large number of
entites or SSH sensors...)

If this was on a Linux server, I'm wondering if the JVM holding
Brooklyn used up all of the server's memory and the OOM killer was
invoked. If so you should see a message in the output of the "dmesg"
command, or in /var/log/syslog or /var/log/messages.

Richard.


On 29 January 2015 at 13:11, Aled Sage <[email protected]> wrote:
> Hi Sam,
>
> First quick questions:
>
>  * Was brooklyn definitely run with `nohup` or `disown`?
>  * Are you running with any unusual entities that might inadvertently
>    have a System.exit or some such?!
>  * I presume there was no core dump file in the run directory?
>
> The InterruptedException suggest this might be a relatively gracefully
> shutdown. Do you see evidence that the shutdown hook has been called (so the
> management context was shut down cleanly)?
>
> Aled
>
>
> On 29/01/2015 11:07, Sam Corbett wrote:
>>
>> Hi all,
>>
>> I'd like help getting to the bottom of the unexpected termination of a
>> Brooklyn process. It hit me twice in a row yesterday, once with nothing
>> weird in the logs and once with a number of stacktraces indicating an
>> InterruptedException was thrown. Both processes were run on the same host
>> in Softlayer and were running Clocker.
>>
>> The first deployment seemed to be working normally. I had deployed a few
>> applications to my-docker-cloud and stopped them again. A moment later and
>> the process had stopped. The last thing the server did was check the
>> status
>> of a Weave container:
>>
>> 2015-01-28 07:16:15,002 DEBUG brooklyn.SSH
>> [brooklyn-execmanager-BL31ZSeZ-2001]: check-running
>> WeaveContainerImpl{id=r3fpQokV}, on machine
>> SshMachineLocation[159.8.36.8:159.8.36.8/159.8.36.8:22@DKRM1V05],
>> completed: return status 0
>> 2015-01-28 07:16:15,371 DEBUG b.launcher.BrooklynWebServer
>> [shutdownHookThread]: BrooklynWebServer detected shutdown: stopping
>> web-console
>>
>> There were no interesting exceptions in the debug log.
>>
>> In the second case the process stopped as Brooklyn waited for the status
>> of
>> a service that did not provision. This time there was a (lot of)
>> stacktrace
>> in the logs. Most pertinently perhaps was:
>>
>> 2015-01-28 09:07:49,891 DEBUG b.u.task.BasicExecutionManager
>> [brooklyn-execmanager-YRXvc51z-1676]: Exception running task
>> Task[post-start:ihocjzth] (rethrowing): java.lang.InterruptedException:
>> sleep interrupted
>> brooklyn.util.exceptions.RuntimeInterruptedException:
>> java.lang.InterruptedException: sleep interrupted
>> at brooklyn.util.exceptions.Exceptions.propagate(Exceptions.java:89)
>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at brooklyn.util.time.Time.sleep(Time.java:312)
>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at brooklyn.util.time.Time.sleep(Time.java:318)
>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at brooklyn.util.repeat.Repeater.runKeepingError(Repeater.java:382)
>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at brooklyn.util.repeat.Repeater.run(Repeater.java:305)
>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at brooklyn.entity.basic.Entities.waitForServiceUp(Entities.java:1028)
>> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at
>>
>> brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:370)
>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at
>>
>> brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:367)
>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at
>>
>> brooklyn.entity.basic.SoftwareProcessDriverLifecycleEffectorTasks.postStartCustom(SoftwareProcessDriverLifecycleEffectorTasks.java:160)
>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at
>>
>> brooklyn.entity.software.MachineLifecycleEffectorTasks$7.run(MachineLifecycleEffectorTasks.java:431)
>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> ~[na:1.7.0_65]
>> at
>>
>> brooklyn.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:337)
>> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at
>>
>> brooklyn.util.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:469)
>> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_65]
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> [na:1.7.0_65]
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> [na:1.7.0_65]
>> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
>> Caused by: java.lang.InterruptedException: sleep interrupted
>> at java.lang.Thread.sleep(Native Method) [na:1.7.0_65]
>> at brooklyn.util.time.Time.sleep(Time.java:310)
>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
>> ... 15 common frames omitted
>>
>> I've got full logs for each run of the server but I didn't get the exit
>> code of either process. I ran a third test on the same host later in the
>> day and nothing went wrong (in a reasonable timeframe).
>>
>> Has anyone experienced this before? Are there any system logs I could have
>> looked to for more information? A brief look at the standard /var/log
>> files
>> revealed nothing.
>>
>> It was a bit alarming to see that in the first instance the process
>> stopped
>> with no indication why.
>>
>> Sam
>>
>

Reply via email to