Ryan Moquin created FELIX-6190:
----------------------------------
Summary: Declarative services component implementing
EventHookListener deadlocks SCR.
Key: FELIX-6190
URL: https://issues.apache.org/jira/browse/FELIX-6190
Project: Felix
Issue Type: Bug
Components: Declarative Services (SCR)
Affects Versions: scr-2.1.16
Reporter: Ryan Moquin
When a declarative services component that implements EventHookListener is
loaded by SCR, a deadlock occurs. This occurs since the SCR will attempt to
get the service so it can deliver event notifications to it while it's already
in the process of loading the service. Here is a breakdown of the deadlock
stacktrace we ran into, I spent some time identifying the services that are
being interacted with at the various stages in the thread stacktraces to come
to this conclusion. After some thinking, it seems like the fix would be to
check if an EventHookListener that needs to be loaded matches the service that
is in progress of being loaded. I THINK that would prevent this deadlock from
occurring. Obviously this problem can be worked around, but obviously is
confusing when it occurs. Scott Lewis (who run the ECF project said it was
intermittent for him), I ran into it with Equinox first, switched to Felix and
then ran into it everytime I ran the project using an exported bndtools jar
with the ECF. Scott initially logged this against Equinox and there was some
discussion there. I'm attaching the issue to this one in case useful.
In the below breakdown and stacktraces, the TopologyManager class (from the ECF
project) is being loaded by the SCR. That class implements the
EventHookListener interface:
Main thread:
SCR tries to register the TopologyManager
Service event type 1 is fired
Equinox/Felix iterates the event listener hooks for which the TopologyManager
is one, so it tries to get the TopologyManager service (to do the notification).
an attempt to retrieve the service count service to update the change count
ComponentRegistry updateChangeCount method is called
locks on monitor changeCountLock
Timer Thread 0:
ComponentRegistry locks the changeCountLock
SCR service, properties modified - service.changecount
fires event 2
tries to retrieve TopologyManager, because it's EventListenerHook to notify of
the event
then waits on servicecount latch for static class in ServiceHolder
Stack trace from Scott, I didn't save the stack traces from the threads I was
investigating, but I can easy get them if my above explanation isn't helpful
enough to reproduce with.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)