I agree. We are assuming TCP keepalive enabled and there is a watch-dog on all synchronous remote method invocations that times-out after 30 seconds.
-----Original Message----- From: Gregg Wonderly [mailto:[email protected]] Sent: Wednesday, July 11, 2012 4:06 PM To: [email protected] Subject: Re: Question about LeaseRenewalManager and renewDuration Yes, getting all of the stuff right with the LRM is the question, but I'm just suggesting that once you are satisfied with that, you might find that the usual TCP SYN timeout of 3-5 minutes is now in the way of getting 30sec responses. There are lots of pieces to the puzzle, and I just wanted to point out that the LRM and TCP endpoints are usually not the most usable "connectivity" management solutions. I regularly use a TCP stream with keep alive on, and periodic traffic, or even UDP when I need "real-time" connectivity information. What you are doing with the LRM can give you more information than you see by default, for sure. But, you may find that even it, can't give you the granularity that you're after. Gregg Wonderly On Jul 11, 2012, at 12:15 AM, Itai Frenkel wrote: > Gregg, > > Let's assume for the sake of the discussion that an outgoing TCP connection > throws ConnectException after 30 seconds in case the listener OS was abruptly > shutdown without sending FIN, and also assume that any subsequent TCP > connection to that listener would immediately throw ConnectException. > > This does not invalidate the basic question. And that is - how to configure > an upper bound for the time it takes notification to "eventually" get to the > client - once a temporary listener network unavailability has been resolved. > > Thanks, > Itai > > -----Original Message----- > From: Gregg Wonderly [mailto:[email protected]] > Sent: Tuesday, July 10, 2012 9:45 PM > To: [email protected] > Subject: Re: Question about LeaseRenewalManager and renewDuration > > Recall, that under the covers there is also all the OS network stack > behaviors. What is the TCP SYN timeout, for example; i.e. how long will a > TCP connect request, which will eventually fail, take before failing? > > I think it's important to understand that unless you are on either end of a > TCP connection, with timeout and keep alive settings for that connection, > turned down to short intervals, that you're going to be mystified at the > longer than expected timing of most failure detections. > > Subclassing the appropriate endpoint class, and adjusting it's behavior and > using that on your registrar may be part of what you need to do, to see quick > notifications. > > Gregg Wonderly > > On Jul 10, 2012, at 10:49 AM, Greg Trasuk wrote: > >> >> On Tue, 2012-07-10 at 10:14, Itai Frenkel wrote: >>>>> Are you sure about that? >>> Looking at RegistrarImpl when ThrowableConstants.retryable(e) returns >>> BAD_OBJECT, it rethrows only if (e instanceof Error), otherwise it cancels >>> the lease. Since ConnectException is not an Error the lease would be >>> canceled. >>> Why is the Error check being performed ? >>> >> ThrowableConstants.retryable(e) only returns BAD_OBJECT if it >> receives a definite response from the remote endpoint. For a comm >> failure, it should return INDEFINITE. Having said that, the logic >> seems to favour declaring an exception "Definite" where it might be >> arguable. For instance, it will declare BAD_OBJECT in the case of a "No >> route to host" >> exception, which arguably could be temporary, for instance if a >> router goes offline. >> >>>>> Personally, I'd use an internal timer on the client side that says "if I >>>>> don't receive any events for a given time, I'll cancel the current lease >>>>> and re-register". >>> That requires the Registrar to periodically send probe notifications. The >>> number of real world notifications could fluctuate from zero to high load >>> and cannot be trusted without probe notifications. >>> >> Might be an interesting improvement if a client could request a >> heartbeat or supervisory message from the registrar. But my point >> above was that if the events are not coming fast enough to satisfy a >> reasonable "liveness" timeout, then it's probably not a big problem >> if the client simply cancels the lease and re-registers. So you >> could effectively implement your own heartbeat. >> >> Alternately (subject to exploring the loading and the number of >> clients) you could create a service that does nothing but registers, >> then updates its service attributes periodically, which would have >> the effect of generating registrar messages. Starting to get a >> little complicated and indirect, though. >> >> In the end, however, it seems like your trying to have the client >> find out that it's not receiving registrar notifications. I can't >> think of any better evidence than "you're not receiving registrar >> notifications". >> >> Cheers, >> >> Greg. >> >>> Thanks, >>> Itai >>> >>> -----Original Message----- >>> From: Greg Trasuk [mailto:[email protected]] >>> Sent: Tuesday, July 10, 2012 4:36 PM >>> To: [email protected] >>> Subject: Re: Question about LeaseRenewalManager and renewDuration >>> >>> >>> On Tue, 2012-07-10 at 06:41, Itai Frenkel wrote: >>> <snip...> >>>> Background Information: >>>> The motivation for this is the way the Registrar handles event >>>> notifications. >>>> When the Registrar fails to send a notification to a listener due >>>> to a temporary network glitch, it assumes the listener is no longer >>>> available and cancels the event lease. >>> >>> Are you sure about that? Looking through >>> com.sun.jini.reggie.RegistrarImpl, it appears that when an exception occurs >>> during event notification, the code tries to categorize the exception as >>> either "definite" (no such event, no such object, etc) or "indefinite" >>> (communications failure). Then it only cancels the lease on a definite >>> exception. >>> >>> In other words, the lease is maintained in the case of a temporary network >>> failure. After all, that's the whole point of the lease: it represents an >>> agreement between the client and service that resources are going to be >>> maintained for a definite time period. >>> >>> Personally, I'd use an internal timer on the client side that says "if I >>> don't receive any events for a given time, I'll cancel the current lease >>> and re-register". If the events are that quiet, then clearly the registrar >>> is not that heavily loaded, so the overhead of cancelling the lease and >>> creating a new registration should not be too bad. You'd want to test it >>> under simulated load, of course. >>> >>> Cheers, >>> >>> Greg. >>> -- >>> Greg Trasuk, President >>> StratusCom Manufacturing Systems Inc. - We use information technology to >>> solve business problems on your plant floor. >>> http://stratuscom.com >>> >>> >>> >> > > >
