I ran clocker.sh, which runs `exec java ${JAVA_OPTS} -cp
"${INITIAL_CLASSPATH}" brooklyn.clocker.Main "$@"`. None of the entities
that were running do anything weird. There was no core dump. Shutdown hooks
were run in both cases.The processes were run on a VM in Softlayer. I don't have the precise specs of the machine to hand but it was Debian, had 8Gb of RAM and was running OpenJDK 7. Brooklyn was - or at least should have been - operating a light load in both cases. Each had a two-host Clocker application running and had had a few other applications with two or three entities deployed to Clocker and then stopped again. I don't have the machine available any more so can't get to /var/log. On 29 January 2015 at 13:19, Richard Downer <[email protected]> wrote: > Further questions: > > Where were you running this process - on your workstation, or on a > Linux server elsewhere? What is the spec/OS of the machine running > Brooklyn? How much "stuff" was going on in Brooklyn? (large number of > entites or SSH sensors...) > > If this was on a Linux server, I'm wondering if the JVM holding > Brooklyn used up all of the server's memory and the OOM killer was > invoked. If so you should see a message in the output of the "dmesg" > command, or in /var/log/syslog or /var/log/messages. > > Richard. > > > On 29 January 2015 at 13:11, Aled Sage <[email protected]> wrote: > > Hi Sam, > > > > First quick questions: > > > > * Was brooklyn definitely run with `nohup` or `disown`? > > * Are you running with any unusual entities that might inadvertently > > have a System.exit or some such?! > > * I presume there was no core dump file in the run directory? > > > > The InterruptedException suggest this might be a relatively gracefully > > shutdown. Do you see evidence that the shutdown hook has been called (so > the > > management context was shut down cleanly)? > > > > Aled > > > > > > On 29/01/2015 11:07, Sam Corbett wrote: > >> > >> Hi all, > >> > >> I'd like help getting to the bottom of the unexpected termination of a > >> Brooklyn process. It hit me twice in a row yesterday, once with nothing > >> weird in the logs and once with a number of stacktraces indicating an > >> InterruptedException was thrown. Both processes were run on the same > host > >> in Softlayer and were running Clocker. > >> > >> The first deployment seemed to be working normally. I had deployed a few > >> applications to my-docker-cloud and stopped them again. A moment later > and > >> the process had stopped. The last thing the server did was check the > >> status > >> of a Weave container: > >> > >> 2015-01-28 07:16:15,002 DEBUG brooklyn.SSH > >> [brooklyn-execmanager-BL31ZSeZ-2001]: check-running > >> WeaveContainerImpl{id=r3fpQokV}, on machine > >> SshMachineLocation[159.8.36.8:159.8.36.8/159.8.36.8:22@DKRM1V05], > >> completed: return status 0 > >> 2015-01-28 07:16:15,371 DEBUG b.launcher.BrooklynWebServer > >> [shutdownHookThread]: BrooklynWebServer detected shutdown: stopping > >> web-console > >> > >> There were no interesting exceptions in the debug log. > >> > >> In the second case the process stopped as Brooklyn waited for the status > >> of > >> a service that did not provision. This time there was a (lot of) > >> stacktrace > >> in the logs. Most pertinently perhaps was: > >> > >> 2015-01-28 09:07:49,891 DEBUG b.u.task.BasicExecutionManager > >> [brooklyn-execmanager-YRXvc51z-1676]: Exception running task > >> Task[post-start:ihocjzth] (rethrowing): java.lang.InterruptedException: > >> sleep interrupted > >> brooklyn.util.exceptions.RuntimeInterruptedException: > >> java.lang.InterruptedException: sleep interrupted > >> at brooklyn.util.exceptions.Exceptions.propagate(Exceptions.java:89) > >> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] > >> at brooklyn.util.time.Time.sleep(Time.java:312) > >> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] > >> at brooklyn.util.time.Time.sleep(Time.java:318) > >> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] > >> at brooklyn.util.repeat.Repeater.runKeepingError(Repeater.java:382) > >> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] > >> at brooklyn.util.repeat.Repeater.run(Repeater.java:305) > >> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] > >> at brooklyn.entity.basic.Entities.waitForServiceUp(Entities.java:1028) > >> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] > >> at > >> > >> > brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:370) > >> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] > >> at > >> > >> > brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:367) > >> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] > >> at > >> > >> > brooklyn.entity.basic.SoftwareProcessDriverLifecycleEffectorTasks.postStartCustom(SoftwareProcessDriverLifecycleEffectorTasks.java:160) > >> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] > >> at > >> > >> > brooklyn.entity.software.MachineLifecycleEffectorTasks$7.run(MachineLifecycleEffectorTasks.java:431) > >> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] > >> at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > >> ~[na:1.7.0_65] > >> at > >> > >> > brooklyn.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:337) > >> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] > >> at > >> > >> > brooklyn.util.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:469) > >> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] > >> at java.util.concurrent.FutureTask.run(FutureTask.java:262) > [na:1.7.0_65] > >> at > >> > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > >> [na:1.7.0_65] > >> at > >> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >> [na:1.7.0_65] > >> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] > >> Caused by: java.lang.InterruptedException: sleep interrupted > >> at java.lang.Thread.sleep(Native Method) [na:1.7.0_65] > >> at brooklyn.util.time.Time.sleep(Time.java:310) > >> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] > >> ... 15 common frames omitted > >> > >> I've got full logs for each run of the server but I didn't get the exit > >> code of either process. I ran a third test on the same host later in the > >> day and nothing went wrong (in a reasonable timeframe). > >> > >> Has anyone experienced this before? Are there any system logs I could > have > >> looked to for more information? A brief look at the standard /var/log > >> files > >> revealed nothing. > >> > >> It was a bit alarming to see that in the first instance the process > >> stopped > >> with no indication why. > >> > >> Sam > >> > > >
