Hi, On Tue, Jan 22, 2008 at 05:27:59PM +0100, Lars Marowsky-Bree wrote: > On 2008-01-22T17:37:15, Keisuke MORI <[EMAIL PROTECTED]> wrote: > > > The background of why we developed this tool is that: > > 1) We want to detect a process failure asynchronously, > > not only by the periodic monitor operations, to cause a > > failover faster to minimize the service downtime. > > Right, that's a good idea. > > > 2) We want to make it usable as "an additional feature" for > > arbitrary applications without modifying existent RAs and > > the application itself. > > I see your point, and that's certainly a valid use case. > > Nevertheless, I'd argue that for our included RAs, it would be nice if > they would auto-register and make this functionality available > completely automatically. This would be easier for users, and easier to > maintain. > > Even if not all RAs do that immediately, I think having simple-to-use > shell functions for them to do so would be immensely useful. > > The RAs would also selectively sign in and out as resources are stopped > or started, or add/remove processes from the watch list as required by > other actions (such as promote/demote, or other extensions in the > future.) I think that would be more fine-grained.
Seconded. This belongs to the RA proper, because they have the exact information about which processes run, what do they look like, and which should be monitored. Duplicating that information in the configuration would make maintenance more difficult and error prone. > > But for those techniques, waitpid() can handle only its child > > process and it can not be used to monitor a process launched > > by heartbeat. By using poll()/select()/inotify(), it can not be > > detect if a process gets to "the zombie state" as long as we studied. > > Please let me know if I'm wrong, or there's better way to do this. Advanced Programming in the UNIX Environment by W. Richard Stevens is probably the place to investigate this further. > No, I think you are right. I didn't consider the Z state. It might be > possible to somehow get at that state asynchronously via inotify() or > kernel events, but I don't readily know how. > > Using these async mechanisms though would provide a further speed > advantage, and reduce the load (less polling). Processes dieing > completely is also, I think, more likely than processes going zombie. > > Maybe a future version could combine both techniques? Use async > notifications to capture process deaths immediately, and periodically > scan (possibly at a lower frequency) for zombies. > > Or leave the zombie scan (as well as checking for otherwise unresponsive > or malfunctioning processes) to the monitor op of the RA proper. This is probably the way entailing the least effort. It is fairly easy to check the state of the process in a shell script. > > the procd is already using the asynchronous notification to the > > CRM in the same manner of 'crm_resource -F' command and that is > > the primary purpose of this tool. > > > > Please point me out if I'm misunderstanding what you mean. > > No, I misread the code. Thanks for correcting me. > > > > procd also probably should be started by a RA, not by a respawn line. > > It's a respawned daemon because it can be used if you want to > > montor two or more applications. > > Agreed, but I guess making the daemon a resource which is managed itself > would make it possible to monitor and restart as needed. Same as for > pingd, I think. > > > Thank you again for all of your comment. > > > > I'll start to fix them and if there're further comments please > > let me know. > > Thanks for this useful tool! Indeed. Great contribution! Dejan > > Regards, > Lars > > -- > Teamlead Kernel, SuSE Labs, Research and Development > SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG N?rnberg) > "Experience is the name everyone gives to their mistakes." -- Oscar Wilde > > _______________________________________________________ > Linux-HA-Dev: [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ _______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
