[
https://issues.apache.org/jira/browse/FELIX-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17078673#comment-17078673
]
Scott Lewis commented on FELIX-6190:
------------------------------------
[~cziegeler]: I agree this looks like the same problem. FYI I've brought
also just brought this up on the eclipse.org bug
[https://bugs.eclipse.org/bugs/show_bug.cgi?id=551799]
Unfortunately, ECF remote services has already worked around this deadlock in
it's released code, so it's going to take a little doing to get back to a
situation where this fix can be tested via ECF remote services in my local
environment. I can't do this right away, but perhaps [~rmoquin] can. I'm
also hopeful that the fix for FELIX-6252 does remove deadlock potential wrt
service event hooks usage.
> Declarative services component implementing EventHookListener deadlocks SCR.
> ----------------------------------------------------------------------------
>
> Key: FELIX-6190
> URL: https://issues.apache.org/jira/browse/FELIX-6190
> Project: Felix
> Issue Type: Bug
> Components: Declarative Services (SCR)
> Affects Versions: scr-2.1.16
> Reporter: Ryan Moquin
> Priority: Minor
>
> When a declarative services component that implements EventHookListener is
> loaded by SCR, a deadlock occurs. This occurs since the SCR will attempt to
> get the service so it can deliver event notifications to it while it's
> already in the process of loading the service. Here is a breakdown of the
> deadlock stacktrace we ran into, I spent some time identifying the services
> that are being interacted with at the various stages in the thread
> stacktraces to come to this conclusion. After some thinking, it seems like
> the fix would be to check if an EventHookListener that needs to be loaded
> matches the service that is in progress of being loaded. I THINK that would
> prevent this deadlock from occurring. Obviously this problem can be worked
> around, but obviously is confusing when it occurs. Scott Lewis (who run the
> ECF project said it was intermittent for him), I ran into it with Equinox
> first, switched to Felix and then ran into it everytime I ran the project
> using an exported bndtools jar with the ECF. Scott initially logged this
> against Equinox and there was some discussion there. I'm attaching the issue
> to this one in case useful.
> In the below breakdown and stacktraces, the TopologyManager class (from the
> ECF project) is being loaded by the SCR. That class implements the
> EventHookListener interface:
>
> Main thread:
> SCR tries to register the TopologyManager
> Service event type 1 is fired
> Equinox/Felix iterates the event listener hooks for which the TopologyManager
> is one, so it tries to get the TopologyManager service (to do the
> notification).
> an attempt to retrieve the service count service to update the change count
> ComponentRegistry updateChangeCount method is called
> locks on monitor changeCountLock
>
> Timer Thread 0:
> ComponentRegistry locks the changeCountLock
> SCR service, properties modified - service.changecount
> fires event 2
> tries to retrieve TopologyManager, because it's EventListenerHook to notify
> of the event
> then waits on servicecount latch for static class in ServiceHolder
>
> Stack trace from Scott, I didn't save the stack traces from the threads I was
> investigating, but I can easy get them if my above explanation isn't helpful
> enough to reproduce with.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)