On Monday 19 February 2007 19:22, Peter wrote:
> On Mon, 19 Feb 2007, Tzahi Fadida wrote:
> > I did not understand the current problem, however, just giving my 2c
> > blindly so...:
> > 1. Internet sockets using tcp protocol is the same as using select with a
> > timeout since this is what the tcp protocol is doing. Timeouts.
>
> No. The internet sockets have a close() call which notifies the other
> side that this side went down. They also have a timeout in case this
> does not happen (SO_LINGER etc), or if the other side was nuked. Local
> sockets have none of these properties. The OS assumes that whoever made
> the sockets will take care of them. There is no nanny.

That's what i meant, everything can be replaced by a timeout using select. 
i.e. you take care of this.

>
> > 2. Can't all this discussion be solved by simply reading the /proc
> > directory as a failsafe measure for whatever is the purpose of finding
> > out if a process is still alive?
>
> There is no 'failsafe' external to a program. One can decide that a
> program works or not by inspecting its inputs and outputs from time to
> time. If there are none for some time one can decide that it is dead.
> But this can be wrong. Really important programs are actually executed
> in parallel on different computers and the outputs compared all the
> time. When the outputs of three such computers are not the same then two
> out of three decide that the third is 'wrong'.
>
> In general, it should be obvious to anyone having some common sense that
> a 'small simple program or wrapper' cannot guess whether a 'large
> complex program' is doing what it is supposed to do.

You can come up with a creative solution, for example, to allow the user to 
decide on the amount of a timeout. You can also define configurable 
scenarios, like check /proc for some clue, etc... Or, use a technique to 
ignore false positives by using compound event patterns. For example, you can 
decide that if there is no reply for 100sec it is an event. But this event 
might not be enough. Then you check to see sup with the data stream and that 
is another event. together it is a pattern (timeout,noData,noSig) which is 
actually also an event. You can of course add thresholds and probabilities 
but we digres. 

Essentially you are right that poking a system to see if it is alive can bite 
your hand.

I wonder if there is a FOSS event engine out there aside from the one i know 
from IBM.

>
> Peter

-- 
Regards,
        Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

================================================================To unsubscribe, 
send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to