I think that it could be beneficial, to provide code examples, in some form 
that do the two different things that are possible to make this less visible.  
First, show the reader how to use an exit hook in the tutorial to see the 
service registration disappear.  Second, show them how to use the lease timeout 
value to make the change happen automatically for the case of a network split 
or network card or computer failure that would keep the exit hook from ever 
generating network traffic to cancel the lease.

I still feel that we actually need new APIs that operate at a bit higher level 
and provide all of these things as parameters to richer functions.

Gregg Wonderly

> On Apr 6, 2015, at 8:56 PM, Greg Trasuk <tras...@stratuscom.com> wrote:
> 
> 
> Hi all:
> 
> I updated the tutorial to include the discussion below in the “hello-service” 
> module.  ‘svn up’ should bring it down to your local machine.  I haven’t yet 
> integrated Patricia’s formatting suggestions, mainly because I have to dig in 
> to Maven’s site command a bit to include the correct css, but I’ll do that 
> before we release.
> 
> Any feedback is greatly appreciated.
> 
> Cheers,
> 
> Greg Trasuk
> 
> On Apr 6, 2015, at 3:30 PM, Greg Trasuk <tras...@stratuscom.com> wrote:
> 
>> 
>> Hi Dan:
>> 
>> Thanks for the great feedback.  
>> 
>> I’m pretty sure you already know this, Dan, since you’re a long-time Jini 
>> user, but let me explain for the newer folks and the archives.  This is a 
>> case where what you’re seeing is the expected behaviour.  When the service 
>> registers itself with Reggie, it takes out a lease on the registration. That 
>> lease is usually renewed periodically by the service’s JoinManager (that 
>> isn’t quite the whole story, but it’ll do for now).  When you kill the 
>> service unexpectedly with ctrl-c, the service doesn’t de-register itself, 
>> however the lease eventually runs out (now that it’s not being renewed by 
>> the service) and then the registration expires, allowing Reggie to reclaim 
>> its resources and notify any registrar listeners. 
>> 
>> It would be possible to register a vm shutdown hook to de-register the 
>> service before the vm exits, but in this case I think it’s actually better 
>> to leave it out, since it demonstrates nicely that a dead  service (or at 
>> least a dead JoinManager) eventually gets dropped from the registrar.
>> 
>> You said the duplicate service instances “worked”, in that you can show info 
>> and browse the service, but of course, you’re really just looking at the 
>> information that’s in the registry - the registrar and service browser don’t 
>> actually contact the service.  Reggie has no knowledge of the “liveness” of 
>> the service, and doesn’t attempt to do any “health check”.  
>> 
>> In fact, it’s a common misconception that if the service renews the lease, 
>> it must be “live”.  This turns out to be false for many reasons.  (1) The 
>> service could have delegated its lease renewals to a different service.  (2) 
>> There’s no guarantee that failure of the actual service thread would also 
>> cause failure of the lease renewal thread, even if they are in the same 
>> process (embedded programmers might recognize this as being similar to the 
>> “resetting the watchdog in a timer-triggered interrupt service routine” 
>> problem).  (3) Even if there were a health check task, the service could 
>> fail in the instant just after the health check.  The most a health check, 
>> monitor or heartbeat can do is place a limit on how long it takes to find 
>> out a service has failed.  The only way to say with certainty that a service 
>> “works” is to attempt to use it.
>> 
>> The lease is purely for the convenience of the registrar (or generically, 
>> the service granting the lease).  If ever the lease is not renewed, the 
>> landlord can go ahead and reclaim whatever resources were dedicated to the 
>> lease.  In the case of Reggie, if the lease isn’t renewed, Reggie drops the 
>> registration.  So there’s little risk of “stuck registrations”.  And since 
>> the lease can be renewed, there’s no need for any kind of extended default 
>> timeout.
>> 
>> So, I think I’ll put most of the above explanation into the tutorial, unless 
>> anyone has other thoughts.
>> 
>> Cheers,
>> 
>> Greg Trasuk
>> 
>> On Apr 6, 2015, at 1:42 PM, Dan Rollo <danro...@gmail.com> wrote:
>> 
>>> Hi Greg,
>>> 
>>> I finally took some time to try this out. It really looks great to me!
>>> 
>>> I noticed one minor thing that I thought might confuse users: While going 
>>> through tutorial steps, I decided to stop (via cntrl+c) are restart the 
>>> hello-service a couple times. This resulted in the service being shown 
>>> multiple times in the service browser (screenshot attached). It appeared 
>>> all the duplicate instances in the browser “worked” (I could “show info” 
>>> and “browse service” on all of them). Eventually, the duplicate 
>>> registrations “cleaned up” and I was left with just one. I’m not sure how 
>>> best to avoid confusion about this situation. Would more doc about 
>>> “why”/“how” that works just complicate things? Is there any sort of “force 
>>> lease check” to do in the browser that could clear up the duplicates 
>>> sooner? (And if so, would that be worth noting in the tutorial?). So 
>>> basically, not sure this is a “problem”, but thought I’d ask…
>>> 
>>> Thanks!
>>> Dan
>>> 
>>> <revier-examples-RepeatedService.png>
>> 
> 

Reply via email to