[ 
https://issues.apache.org/jira/browse/RIVER-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Firmstone resolved RIVER-337.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: River_3.0.0

If this occurs, it is logged, however it is very unlikely to occur as a number 
of race condions have been fixed in SDM since.

> Attempted discard of unknown registrar kills LookupLocatorDiscovery thread
> --------------------------------------------------------------------------
>
>                 Key: RIVER-337
>                 URL: https://issues.apache.org/jira/browse/RIVER-337
>             Project: River
>          Issue Type: Bug
>          Components: net_jini_discovery, net_jini_lookup
>    Affects Versions: jtsk_2.1, River_2.1.1
>            Reporter: Chris Dolan
>             Fix For: River_3.0.0
>
>
> The method
>    
> net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(DiscoveryEvent)
> has the following code that throws a RuntimeException (the code comment 
> suggests that it is supposed to be impossible, but it's not).
>         ProxyReg reg = findReg(proxys[i]);
>         if(reg != null ) { // this check can be removed.
>             proxyRegSet.remove(proxyRegSet.indexOf(reg));
>             drops.add(reg);
>         } else {
>             throw new RuntimeException("discard error");
>         }//endif
> Our QA does failover testing with two servers, each with a Reggie, where we 
> deliberately crash and reboot server 1 then server 2 every 30 minutes 
> continuously.  In one case, we hit that RuntimeException.  I don't know why 
> we got a null reg (that's a problem for another defect, maybe an undiagnosed 
> race of two discards put on a task queue?  Maybe related to RIVER-37?).  But 
> it caused a catastrophic chain of events because the RuntimeException is not 
> caught anywhere up the stack.  In our case, it killed the 
> LookupLocatorDiscovery$Notifier thread.
> java.lang.RuntimeException: discard error
>       at 
> net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(2639)
>       at net.jini.discovery.LookupDiscoveryManager.notifyListener(1375)
>       at net.jini.discovery.LookupDiscoveryManager.notifyListener(1356)
>       at net.jini.discovery.LookupDiscoveryManager.access$500(92)
>       at 
> net.jini.discovery.LookupDiscoveryManager$LocatorDiscoveryListener.discarded(543)
>       at net.jini.discovery.LookupLocatorDiscovery$Notifier.run(650)
> I propose three changes:
>   1) change the discarded() method above to simply warn instead of throwing
>   2) put a try/catch(Throwable) around the listener invocation in 
>      LookupLocatorDiscovery$Notifier.run()
>   3) put a similar try/catch around listener invocation in 
> LookupDiscoveryManager.notifyListener
> The idea behind #2 and #3 is that misbehaving listeners should not be allowed 
> to derail the discovery process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to