Hi, On Wed, Apr 04, 2012 at 07:42:12PM +0200, David Gubler wrote: > Hi Dejan, > > On 04.04.2012 17:56, Dejan Muhamedagic wrote: > > The timeout is a timeout, wherever it happens. > > Unfortunately not! If the monitor operation times out and Pacemaker > moves on, the wget process (and thus the whole monitor process) will > keep running. In fact, it may still be running many minutes after the > timeout happened. And since the monitor (at least in case of the apache > resource agent) can't be run twice in parallel, this effectively > prevents further monitor operations until wget has timed out. And that's > exactly where we get a problem.
Hmm, the process running the monitor operation should be removed (killed) by lrmd on timeout. If that doesn't happen, then you just hit a jackpot bug! > > So, you want the resource agent to notice while running monitor > > that it can now talk to the server? > Yes, I want automatic recovery. The resource agent should notice when > apache is back and working again. And that works fine with a patched > apache resource agent. Hmm, I though we were past this... and I still don't see the patch :) Cheers, Dejan > >> On a side note: > >> The apache resource agent allows to supply a config file, where one can > >> override the parameters for curl/wget. But the implementation here is > >> bogus, because even if you supply this file, it always does a default > >> test with default parameters first, so this is useless in this case... > >> (I consider this behavior to be a bug). > > If you use a config test file, you'd need to define a monitor > > with depth 10. The depth 0 monitor (default) is always testing > > the statusurl. > Yes, I figured that, but it's besides the point. If I use depth 10, it > will first do the simple (depth 0) test anyway (!), and after that the > advanced (depth 10) test. And since the simple test doesn't have a > useful timeout for wget, it will still stall for a long time if apache > doesn't respond, and it is irrelevant what the advanced test does. > > @simple tests: Even though we run a complex web application behind > apache (apache acts as a load balancer using mod_jk), I don't want a > more complex test than fetching /server-status on localhost. This simple > test already shows that apache is working and has threads available for > clients to connect. Failover for our application servers is done by > mod_jk, I don't need Heardbeat/Pacemaker for that. Think of it as > independent failover at each layer: Virtual IPs with Heartbeat/Pacemaker > for failover between Apaches, mod_jk for failover between Tomcats, > mmm_monitor for failover between MySQL servers. > > > Best regards, > > David > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
