Argh. That's boring and plausible. I will always `nohup .. &` in future.
On 29 January 2015 at 14:29, Aled Sage <[email protected]> wrote: > Sam, > > Am I reading that correctly: you ran `clocker.sh` without either of nohup > or disown, and those are not automatically run by clocker.sh? > > If that is the case, the most likely explanation is that your terminal > session terminated so a sighup was sent to all jobs, including your clocker > process. > > Aled > > p.s. see the link that Alasdair sent out: http://unix.stackexchange.com/ > questions/3886/difference-between-nohup-disown-and > > > > On 29/01/2015 14:23, Sam Corbett wrote: > >> I ran clocker.sh, which runs `exec java ${JAVA_OPTS} -cp >> "${INITIAL_CLASSPATH}" brooklyn.clocker.Main "$@"`. None of the entities >> that were running do anything weird. There was no core dump. Shutdown >> hooks >> were run in both cases. >> >> The processes were run on a VM in Softlayer. I don't have the precise >> specs >> of the machine to hand but it was Debian, had 8Gb of RAM and was running >> OpenJDK 7. Brooklyn was - or at least should have been - operating a light >> load in both cases. Each had a two-host Clocker application running and >> had >> had a few other applications with two or three entities deployed to >> Clocker >> and then stopped again. I don't have the machine available any more so >> can't get to /var/log. >> >> On 29 January 2015 at 13:19, Richard Downer <[email protected]> wrote: >> >> Further questions: >>> >>> Where were you running this process - on your workstation, or on a >>> Linux server elsewhere? What is the spec/OS of the machine running >>> Brooklyn? How much "stuff" was going on in Brooklyn? (large number of >>> entites or SSH sensors...) >>> >>> If this was on a Linux server, I'm wondering if the JVM holding >>> Brooklyn used up all of the server's memory and the OOM killer was >>> invoked. If so you should see a message in the output of the "dmesg" >>> command, or in /var/log/syslog or /var/log/messages. >>> >>> Richard. >>> >>> >>> On 29 January 2015 at 13:11, Aled Sage <[email protected]> wrote: >>> >>>> Hi Sam, >>>> >>>> First quick questions: >>>> >>>> * Was brooklyn definitely run with `nohup` or `disown`? >>>> * Are you running with any unusual entities that might inadvertently >>>> have a System.exit or some such?! >>>> * I presume there was no core dump file in the run directory? >>>> >>>> The InterruptedException suggest this might be a relatively gracefully >>>> shutdown. Do you see evidence that the shutdown hook has been called (so >>>> >>> the >>> >>>> management context was shut down cleanly)? >>>> >>>> Aled >>>> >>>> >>>> On 29/01/2015 11:07, Sam Corbett wrote: >>>> >>>>> Hi all, >>>>> >>>>> I'd like help getting to the bottom of the unexpected termination of a >>>>> Brooklyn process. It hit me twice in a row yesterday, once with nothing >>>>> weird in the logs and once with a number of stacktraces indicating an >>>>> InterruptedException was thrown. Both processes were run on the same >>>>> >>>> host >>> >>>> in Softlayer and were running Clocker. >>>>> >>>>> The first deployment seemed to be working normally. I had deployed a >>>>> few >>>>> applications to my-docker-cloud and stopped them again. A moment later >>>>> >>>> and >>> >>>> the process had stopped. The last thing the server did was check the >>>>> status >>>>> of a Weave container: >>>>> >>>>> 2015-01-28 07:16:15,002 DEBUG brooklyn.SSH >>>>> [brooklyn-execmanager-BL31ZSeZ-2001]: check-running >>>>> WeaveContainerImpl{id=r3fpQokV}, on machine >>>>> SshMachineLocation[159.8.36.8:159.8.36.8/159.8.36.8:22@DKRM1V05], >>>>> completed: return status 0 >>>>> 2015-01-28 07:16:15,371 DEBUG b.launcher.BrooklynWebServer >>>>> [shutdownHookThread]: BrooklynWebServer detected shutdown: stopping >>>>> web-console >>>>> >>>>> There were no interesting exceptions in the debug log. >>>>> >>>>> In the second case the process stopped as Brooklyn waited for the >>>>> status >>>>> of >>>>> a service that did not provision. This time there was a (lot of) >>>>> stacktrace >>>>> in the logs. Most pertinently perhaps was: >>>>> >>>>> 2015-01-28 09:07:49,891 DEBUG b.u.task.BasicExecutionManager >>>>> [brooklyn-execmanager-YRXvc51z-1676]: Exception running task >>>>> Task[post-start:ihocjzth] (rethrowing): java.lang. >>>>> InterruptedException: >>>>> sleep interrupted >>>>> brooklyn.util.exceptions.RuntimeInterruptedException: >>>>> java.lang.InterruptedException: sleep interrupted >>>>> at brooklyn.util.exceptions.Exceptions.propagate(Exceptions.java:89) >>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] >>>>> at brooklyn.util.time.Time.sleep(Time.java:312) >>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] >>>>> at brooklyn.util.time.Time.sleep(Time.java:318) >>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] >>>>> at brooklyn.util.repeat.Repeater.runKeepingError(Repeater.java:382) >>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] >>>>> at brooklyn.util.repeat.Repeater.run(Repeater.java:305) >>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] >>>>> at brooklyn.entity.basic.Entities.waitForServiceUp(Entities.java:1028) >>>>> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] >>>>> at >>>>> >>>>> >>>>> brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp( >>> SoftwareProcessImpl.java:370) >>> >>>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] >>>>> at >>>>> >>>>> >>>>> brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp( >>> SoftwareProcessImpl.java:367) >>> >>>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] >>>>> at >>>>> >>>>> >>>>> brooklyn.entity.basic.SoftwareProcessDriverLifecycle >>> EffectorTasks.postStartCustom(SoftwareProcessDriverLifecycle >>> EffectorTasks.java:160) >>> >>>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] >>>>> at >>>>> >>>>> >>>>> brooklyn.entity.software.MachineLifecycleEffectorTasks$7.run( >>> MachineLifecycleEffectorTasks.java:431) >>> >>>> ~[brooklyn-software-base-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] >>>>> at >>>>> >>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>> >>>> ~[na:1.7.0_65] >>>>> at >>>>> >>>>> >>>>> brooklyn.util.task.DynamicSequentialTask$DstJob. >>> call(DynamicSequentialTask.java:337) >>> >>>> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] >>>>> at >>>>> >>>>> >>>>> brooklyn.util.task.BasicExecutionManager$SubmissionCallable.call( >>> BasicExecutionManager.java:469) >>> >>>> ~[brooklyn-core-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] >>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>>> >>>> [na:1.7.0_65] >>> >>>> at >>>>> >>>>> >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker( >>> ThreadPoolExecutor.java:1145) >>> >>>> [na:1.7.0_65] >>>>> at >>>>> >>>>> >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run( >>> ThreadPoolExecutor.java:615) >>> >>>> [na:1.7.0_65] >>>>> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] >>>>> Caused by: java.lang.InterruptedException: sleep interrupted >>>>> at java.lang.Thread.sleep(Native Method) [na:1.7.0_65] >>>>> at brooklyn.util.time.Time.sleep(Time.java:310) >>>>> ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:0.7.0-SNAPSHOT] >>>>> ... 15 common frames omitted >>>>> >>>>> I've got full logs for each run of the server but I didn't get the exit >>>>> code of either process. I ran a third test on the same host later in >>>>> the >>>>> day and nothing went wrong (in a reasonable timeframe). >>>>> >>>>> Has anyone experienced this before? Are there any system logs I could >>>>> >>>> have >>> >>>> looked to for more information? A brief look at the standard /var/log >>>>> files >>>>> revealed nothing. >>>>> >>>>> It was a bit alarming to see that in the first instance the process >>>>> stopped >>>>> with no indication why. >>>>> >>>>> Sam >>>>> >>>>> >
