Sam,

Am I reading that correctly: you ran `clocker.sh` without either of nohup or disown, and those are not automatically run by clocker.sh?

If that is the case, the most likely explanation is that your terminal session terminated so a sighup was sent to all jobs, including your clocker process.

Aled

p.s. see the link that Alasdair sent out: http://unix.stackexchange.com/questions/3886/difference-between-nohup-disown-and


On 29/01/2015 14:23, Sam Corbett wrote:
I ran clocker.sh, which runs `exec java ${JAVA_OPTS} -cp
"${INITIAL_CLASSPATH}" brooklyn.clocker.Main "$@"`. None of the entities
that were running do anything weird. There was no core dump. Shutdown hooks
were run in both cases.

The processes were run on a VM in Softlayer. I don't have the precise specs
of the machine to hand but it was Debian, had 8Gb of RAM and was running
OpenJDK 7. Brooklyn was - or at least should have been - operating a light
load in both cases. Each had a two-host Clocker application running and had
had a few other applications with two or three entities deployed to Clocker
and then stopped again. I don't have the machine available any more so
can't get to /var/log.

On 29 January 2015 at 13:19, Richard Downer <[email protected]> wrote:

Further questions:

Where were you running this process - on your workstation, or on a
Linux server elsewhere? What is the spec/OS of the machine running
Brooklyn? How much "stuff" was going on in Brooklyn? (large number of
entites or SSH sensors...)

If this was on a Linux server, I'm wondering if the JVM holding
Brooklyn used up all of the server's memory and the OOM killer was
invoked. If so you should see a message in the output of the "dmesg"
command, or in /var/log/syslog or /var/log/messages.

Richard.


On 29 January 2015 at 13:11, Aled Sage <[email protected]> wrote:
Hi Sam,

First quick questions:

  * Was brooklyn definitely run with `nohup` or `disown`?
  * Are you running with any unusual entities that might inadvertently
    have a System.exit or some such?!
  * I presume there was no core dump file in the run directory?

The InterruptedException suggest this might be a relatively gracefully
shutdown. Do you see evidence that the shutdown hook has been called (so
the
management context was shut down cleanly)?

Aled


On 29/01/2015 11:07, Sam Corbett wrote:
Hi all,

I'd like help getting to the bottom of the unexpected termination of a
Brooklyn process. It hit me twice in a row yesterday, once with nothing
weird in the logs and once with a number of stacktraces indicating an
InterruptedException was thrown. Both processes were run on the same
host
in Softlayer and were running Clocker.

The first deployment seemed to be working normally. I had deployed a few
applications to my-docker-cloud and stopped them again. A moment later
and
the process had stopped. The last thing the server did was check the
status
of a Weave container:

2015-01-28 07:16:15,002 DEBUG brooklyn.SSH
[brooklyn-execmanager-BL31ZSeZ-2001]: check-running
WeaveContainerImpl{id=r3fpQokV}, on machine
SshMachineLocation[159.8.36.8:159.8.36.8/159.8.36.8:22@DKRM1V05],
completed: return status 0
2015-01-28 07:16:15,371 DEBUG b.launcher.BrooklynWebServer
[shutdownHookThread]: BrooklynWebServer detected shutdown: stopping
web-console

There were no interesting exceptions in the debug log.

In the second case the process stopped as Brooklyn waited for the status
of
a service that did not provision. This time there was a (lot of)
stacktrace
in the logs. Most pertinently perhaps was:

2015-01-28 09:07:49,891 DEBUG b.u.task.BasicExecutionManager
[brooklyn-execmanager-YRXvc51z-1676]: Exception running task
Task[post-start:ihocjzth] (rethrowing): java.lang.InterruptedException:
sleep interrupted
brooklyn.util.exceptions.RuntimeInterruptedException:
java.lang.InterruptedException: sleep interrupted
at brooklyn.util.exceptions.Exceptions.propagate(Exceptions.java:89)
~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at brooklyn.util.time.Time.sleep(Time.java:312)
~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at brooklyn.util.time.Time.sleep(Time.java:318)
~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at brooklyn.util.repeat.Repeater.runKeepingError(Repeater.java:382)
~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at brooklyn.util.repeat.Repeater.run(Repeater.java:305)
~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at brooklyn.entity.basic.Entities.waitForServiceUp(Entities.java:1028)
~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at


brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:370)
~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at


brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:367)
~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at


brooklyn.entity.basic.SoftwareProcessDriverLifecycleEffectorTasks.postStartCustom(SoftwareProcessDriverLifecycleEffectorTasks.java:160)
~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at


brooklyn.entity.software.MachineLifecycleEffectorTasks$7.run(MachineLifecycleEffectorTasks.java:431)
~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
~[na:1.7.0_65]
at


brooklyn.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:337)
~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at


brooklyn.util.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:469)
~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
[na:1.7.0_65]
at


java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_65]
at


java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_65]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
Caused by: java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method) [na:1.7.0_65]
at brooklyn.util.time.Time.sleep(Time.java:310)
~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT]
... 15 common frames omitted

I've got full logs for each run of the server but I didn't get the exit
code of either process. I ran a third test on the same host later in the
day and nothing went wrong (in a reasonable timeframe).

Has anyone experienced this before? Are there any system logs I could
have
looked to for more information? A brief look at the standard /var/log
files
revealed nothing.

It was a bit alarming to see that in the first instance the process
stopped
with no indication why.

Sam


Reply via email to