Hi Mike, I wasted a whole day tracking down what I believe to be the same bug. This is the only discussion thread about the issue that I can find, so I want to put it out here.
If you start god without specifying a config file, the events module doesn't initialize properly, and god isn't notified when the processes quit. Obviously the fix is as easy as adding `-c <config file>` to the command line you used to start god. I'll leave it to people who understand god's internals to explain why this would be the case. Prathan P.S. $ sudo god -V Version: 0.13.3 Polls: enabled Events: netlink On Saturday, May 4, 2013 7:38:07 AM UTC+7, Mike Fellows wrote: > > Apologies if the answer to this is obvious but I am having trouble getting > event based monitoring working on either OSX 10.7.5 or a RHEL Linux variant > (the latest AWS Linux AMI). > > In both cases *god check* is reporting success for event based > monitoring, and I have confirmed for the Linux OS that the > CONFIG_PROC_EVENTS option was set to 'y' when it was compiled. Here is the > output from god check for OSX below, the Linux god check returns the same > output although it is using the 'netlink' event system. > > $ rvmsudo god check > using event system: kqueue > starting event handler > forking off new process > forked process with pid = 1504 > killing process > [ok] process exit event received > > > I've reduced a test case down to something very simple test case. It is a > simplification of the example from http://godrb.com. I also tried the > full sample file from the site with no luck either. Here is the simplified > god file. > > root = File.dirname(File.expand_path(__FILE__)) > > God.watch do |w| > w.name = "simple_test" > w.interval = 30.seconds > w.start = "#{root}/simple_test.rb" > w.log = "#{root}/simple_test.log" > w.uid = 'mike' > w.dir = root > > # determine the state on startup > w.transition(:init, { true => :up, false => :start }) do |on| > on.condition(:process_running) do |c| > c.running = true > end > end > > # determine when process has finished starting > w.transition([:start, :restart], :up) do |on| > on.condition(:process_running) do |c| > c.running = true > end > > # failsafe > on.condition(:tries) do |c| > c.times = 5 > c.transition = :start > end > end > > # start if process is not running > w.transition(:up, :start) do |on| > on.condition(:process_exits) > end > end > > > And here is the output from god when it is started in foreground mode and > the god file is loaded. > > $ rvmsudo god -D --no-syslog --log-level debug > I [2013-05-03 17:28:13] INFO: Syslog disabled. > I [2013-05-03 17:28:13] INFO: Using pid file directory: /var/run/god > I [2013-05-03 17:28:13] INFO: Started on drbunix:///tmp/god.17165.sock > I [2013-05-03 17:28:40] INFO: simple_test Loaded config > I [2013-05-03 17:28:40] INFO: simple_test move 'unmonitored' to 'init' > D [2013-05-03 17:28:40] DEBUG: driver schedule > #<God::Conditions::ProcessRunning:0x007f806081d570> in 0 seconds > I [2013-05-03 17:28:40] INFO: simple_test moved 'unmonitored' to 'init' > I [2013-05-03 17:28:40] INFO: simple_test [trigger] process is not > running (ProcessRunning) > D [2013-05-03 17:28:40] DEBUG: simple_test ProcessRunning [false] > {true=>:up, false=>:start} > I [2013-05-03 17:28:40] INFO: simple_test move 'init' to 'start' > I [2013-05-03 17:28:40] INFO: simple_test start: > /Users/mike/code/STAT-HQ/test/god/simple_test/simple_test.rb > D [2013-05-03 17:28:40] DEBUG: driver schedule > #<God::Conditions::ProcessRunning:0x007f806081fc58> in 0 seconds > D [2013-05-03 17:28:40] DEBUG: driver schedule > #<God::Conditions::Tries:0x007f806081f8e8> in 0 seconds > I [2013-05-03 17:28:40] INFO: simple_test moved 'init' to 'start' > D [2013-05-03 17:28:40] DEBUG: simple_test ProcessRunning [true] > {true=>:up} > I [2013-05-03 17:28:40] INFO: simple_test move 'start' to 'up' > I [2013-05-03 17:28:40] INFO: *simple_test registered 'proc_exit' event > for pid 1659* > I [2013-05-03 17:28:40] INFO: simple_test moved 'start' to 'up' > > > If I then kill the process that god is supposed to be monitoring there is > no output from god and the simple_test.rb program is not restarted. Here > is the output from killing the process. > > sparrow:simple_test mike$ rvmsudo god load simple_test.god > Sending 'load' command with action 'leave' > > The following tasks were affected: > simple_test > $ ps -ef | grep simple_test.rb > 501 *1659* 1 0 5:28pm ?? 0:00.03 /usr/bin/ruby > /Users/mike/code/STAT-HQ/test/god/simple_test/simple_test.rb > 501 1663 875 0 5:28pm ttys004 0:00.00 grep simple_test.rb > $ kill *1659* > $ ps -ef | grep simple_test.rb > 501 1665 875 0 5:28pm ttys004 0:00.00 grep simple_test.rb > $ ps -ef | grep simple_test.rb > 501 1667 875 0 5:29pm ttys004 0:00.00 grep simple_test.rb > $ ps -ef | grep simple_test.rb > 501 1669 875 0 5:30pm ttys004 0:00.00 grep simple_test.rb > > > I've read all the documentation and been through some of the source code > for god but I am not finding any obvious clues. Does my god file look > reasonable? Would anyone have any suggestions for other steps to take to > troubleshoot? > > As a postscript - I am able to get god to monitor and restart processes > using polling but I would prefer to use the event based technique. > > Thanks for any help. > > Regards, > Mike > > -- You received this message because you are subscribed to the Google Groups "god.rb" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/god-rb. For more options, visit https://groups.google.com/groups/opt_out.
