[ 
https://issues.apache.org/jira/browse/DOSGI-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amichai Rothman resolved DOSGI-173.
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 1.5.0
         Assignee: Amichai Rothman

The first part of this issue, incorrect closing of exported services, is fixed 
by DOSGI-180.

The second part, removing unnecessary recreation of InterfaceMonitors, is 
committed.
                
> unregistering an exported service does not remove it from zookeeper (and 
> remote clients)
> ----------------------------------------------------------------------------------------
>
>                 Key: DOSGI-173
>                 URL: https://issues.apache.org/jira/browse/DOSGI-173
>             Project: CXF Distributed OSGi
>          Issue Type: Bug
>    Affects Versions: 1.5.0
>            Reporter: Amichai Rothman
>            Assignee: Amichai Rothman
>             Fix For: 1.5.0
>
>         Attachments: fix_zk_unregisteration.diff
>
>
> I have some bundles exporting and consuming services, running on two hosts. 
> I've noticed more than once that while stopping and starting different 
> bundles on the two hosts (just playing around with them manually to see how 
> robust the distributed system is), at some point one of the hosts doesn't see 
> that a service it was using from the other host is down. Connecting to 
> ZooKeeper directly, I see the node for that service is still there, i.e. the 
> service was not properly removed from ZK even though the bundle is stopped 
> and service is gone.
> Investigating this is a bit tricky, since it involves various trackers, 
> endpoint listeners and service listeners and there is not enough code 
> documentation to understand what the intended flow is... however I've found a 
> few interesting related findings that may point at the solution:
> 1. Following the logs and some debugging, it appears that the problem is not 
> with the discovery.zookeeper package/bundle itself, since the endpoint 
> removed event never gets there.
> 2. In EndpointListenerNotifier.notifyListenersOfRemoval(), the 
> EndpointDescription appears to be null, so there is never a filter match and 
> the endpointRemoved callback is never triggered on the EndpointListeners. 
> This is because all of the ExportRegistrations are already closed by the time 
> they get there. It seems that the premature closing is done by the service 
> tracker created in ExportRegistrationImpl.startServiceTracker(). My guess is 
> that the order in which the service tracker and service listener (in 
> TopologyManagerExport, which triggers the EndpointListenerNotifier) receive 
> the events is arbitrary depending on some race condition somewhere, which may 
> explain why this is an inconsistently reproducible bug. I would like to say 
> that the solution is to get rid of the service tracker altogether (it doesn't 
> do anything else, and as a separate bug, is never closed), but I'm not sure 
> why it was introduced in the first place or if there are any other scenarios 
> in which it was necessary, so I really don't know what the proper solution 
> should be.
> 3. Another element that may have been masking this bug to some degree is the 
> local discovery bundle which was running, and during debugging I saw it 
> triggering some EndpointListener removal events which were picked up by the 
> other components. I'm not entirely sure yet of what this bundle does (I 
> didn't find any mention of it on the website, and didn't get to the code 
> yet), but I just leave this bundle in the stopped state for now, with no 
> visible effects on the testing, making debugging easier.
> 4. An additional related issue which bugged me during a previous code review 
> was that InterfaceMonitorManager.addInterest() is closing and recreating an 
> InterfaceMonitor every time it is invoked with an existing scope, even though 
> the old and new IMs monitor the same ZK node and are practically identical - 
> so why not just leave the old monitor running? This replacement causes a 
> bunch of unnecessary extra work (including several ZK server accesses), a 
> flurry of unnecessary filter-matching logs, and and unnecessary gap in 
> monitoring for ZK changes. This also relates to the bug at hand since 
> InterfaceMonitor.close() also sends some EndpointListener notifications about 
> the endpoints being removed, which leaves some gaps in the registration 
> coverage (before they are re-added moments later) and might interact in some 
> other unpredictable (at least to me) way with the rest of the mechanism. It 
> seems these IM close/start cycles sometimes occur tens of times in a row.
> To sum it up, there's definitely a bug occurring. When I tested a bit with 
> fixes for both potential causes above (IM stop/start replaced with a single 
> start the first time a given scope is encountered, and close invocation in 
> service tracker removed) - I could no longer recreate the bug, but I don't 
> understand all the component interactions well enough to know if there are 
> any side effects, or why they were implemented this way in the first place (I 
> tend to assume there was a good reason for it which I'm unaware of).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to