[
https://issues.apache.org/jira/browse/DOSGI-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Amichai Rothman resolved DOSGI-173.
-----------------------------------
Resolution: Fixed
Fix Version/s: 1.5.0
Assignee: Amichai Rothman
The first part of this issue, incorrect closing of exported services, is fixed
by DOSGI-180.
The second part, removing unnecessary recreation of InterfaceMonitors, is
committed.
> unregistering an exported service does not remove it from zookeeper (and
> remote clients)
> ----------------------------------------------------------------------------------------
>
> Key: DOSGI-173
> URL: https://issues.apache.org/jira/browse/DOSGI-173
> Project: CXF Distributed OSGi
> Issue Type: Bug
> Affects Versions: 1.5.0
> Reporter: Amichai Rothman
> Assignee: Amichai Rothman
> Fix For: 1.5.0
>
> Attachments: fix_zk_unregisteration.diff
>
>
> I have some bundles exporting and consuming services, running on two hosts.
> I've noticed more than once that while stopping and starting different
> bundles on the two hosts (just playing around with them manually to see how
> robust the distributed system is), at some point one of the hosts doesn't see
> that a service it was using from the other host is down. Connecting to
> ZooKeeper directly, I see the node for that service is still there, i.e. the
> service was not properly removed from ZK even though the bundle is stopped
> and service is gone.
> Investigating this is a bit tricky, since it involves various trackers,
> endpoint listeners and service listeners and there is not enough code
> documentation to understand what the intended flow is... however I've found a
> few interesting related findings that may point at the solution:
> 1. Following the logs and some debugging, it appears that the problem is not
> with the discovery.zookeeper package/bundle itself, since the endpoint
> removed event never gets there.
> 2. In EndpointListenerNotifier.notifyListenersOfRemoval(), the
> EndpointDescription appears to be null, so there is never a filter match and
> the endpointRemoved callback is never triggered on the EndpointListeners.
> This is because all of the ExportRegistrations are already closed by the time
> they get there. It seems that the premature closing is done by the service
> tracker created in ExportRegistrationImpl.startServiceTracker(). My guess is
> that the order in which the service tracker and service listener (in
> TopologyManagerExport, which triggers the EndpointListenerNotifier) receive
> the events is arbitrary depending on some race condition somewhere, which may
> explain why this is an inconsistently reproducible bug. I would like to say
> that the solution is to get rid of the service tracker altogether (it doesn't
> do anything else, and as a separate bug, is never closed), but I'm not sure
> why it was introduced in the first place or if there are any other scenarios
> in which it was necessary, so I really don't know what the proper solution
> should be.
> 3. Another element that may have been masking this bug to some degree is the
> local discovery bundle which was running, and during debugging I saw it
> triggering some EndpointListener removal events which were picked up by the
> other components. I'm not entirely sure yet of what this bundle does (I
> didn't find any mention of it on the website, and didn't get to the code
> yet), but I just leave this bundle in the stopped state for now, with no
> visible effects on the testing, making debugging easier.
> 4. An additional related issue which bugged me during a previous code review
> was that InterfaceMonitorManager.addInterest() is closing and recreating an
> InterfaceMonitor every time it is invoked with an existing scope, even though
> the old and new IMs monitor the same ZK node and are practically identical -
> so why not just leave the old monitor running? This replacement causes a
> bunch of unnecessary extra work (including several ZK server accesses), a
> flurry of unnecessary filter-matching logs, and and unnecessary gap in
> monitoring for ZK changes. This also relates to the bug at hand since
> InterfaceMonitor.close() also sends some EndpointListener notifications about
> the endpoints being removed, which leaves some gaps in the registration
> coverage (before they are re-added moments later) and might interact in some
> other unpredictable (at least to me) way with the rest of the mechanism. It
> seems these IM close/start cycles sometimes occur tens of times in a row.
> To sum it up, there's definitely a bug occurring. When I tested a bit with
> fixes for both potential causes above (IM stop/start replaced with a single
> start the first time a given scope is encountered, and close invocation in
> service tracker removed) - I could no longer recreate the bug, but I don't
> understand all the component interactions well enough to know if there are
> any side effects, or why they were implemented this way in the first place (I
> tend to assume there was a good reason for it which I'm unaware of).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira