On Monday 19 February 2007 19:22, Peter wrote: > On Mon, 19 Feb 2007, Tzahi Fadida wrote: > > I did not understand the current problem, however, just giving my 2c > > blindly so...: > > 1. Internet sockets using tcp protocol is the same as using select with a > > timeout since this is what the tcp protocol is doing. Timeouts. > > No. The internet sockets have a close() call which notifies the other > side that this side went down. They also have a timeout in case this > does not happen (SO_LINGER etc), or if the other side was nuked. Local > sockets have none of these properties. The OS assumes that whoever made > the sockets will take care of them. There is no nanny.
That's what i meant, everything can be replaced by a timeout using select. i.e. you take care of this. > > > 2. Can't all this discussion be solved by simply reading the /proc > > directory as a failsafe measure for whatever is the purpose of finding > > out if a process is still alive? > > There is no 'failsafe' external to a program. One can decide that a > program works or not by inspecting its inputs and outputs from time to > time. If there are none for some time one can decide that it is dead. > But this can be wrong. Really important programs are actually executed > in parallel on different computers and the outputs compared all the > time. When the outputs of three such computers are not the same then two > out of three decide that the third is 'wrong'. > > In general, it should be obvious to anyone having some common sense that > a 'small simple program or wrapper' cannot guess whether a 'large > complex program' is doing what it is supposed to do. You can come up with a creative solution, for example, to allow the user to decide on the amount of a timeout. You can also define configurable scenarios, like check /proc for some clue, etc... Or, use a technique to ignore false positives by using compound event patterns. For example, you can decide that if there is no reply for 100sec it is an event. But this event might not be enough. Then you check to see sup with the data stream and that is another event. together it is a pattern (timeout,noData,noSig) which is actually also an event. You can of course add thresholds and probabilities but we digres. Essentially you are right that poking a system to see if it is alive can bite your hand. I wonder if there is a FOSS event engine out there aside from the one i know from IBM. > > Peter -- Regards, Tzahi. -- Tzahi Fadida Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info WARNING TO SPAMMERS: see at http://members.lycos.co.uk/my2nis/spamwarning.html ================================================================To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
