[
https://issues.apache.org/jira/browse/FELIX-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791366#comment-13791366
]
Pierre De Rop commented on FELIX-3910:
--------------------------------------
I have attached the patch candidate (FELIX-3910.patch.5.tgz).
can you please check if it works in your environment ?
All tests are passing, except the two following tests:
1)
testDynamicComponentStateListingLifeCycle2(org.apache.felix.dm.test.ComponentLifeCycleTest)
I think this is a pending issue, and the test also fails without the patch
2)
testServiceRegistrationAndConsumption(org.apache.felix.dm.test.AspectAwareServiceDependencyTest)
This test also fails without the patch: I looked into it quickly and think that
the pb might come from the dependency of the "ServiceConsumerCallbacks"
component, which is optional and the swap method is called twice (the second
swap call occurs when the "ServiceConsumerCallbacks" component is removed
because the dependency is optional ... but this can be discussed in another
jira issue.
patch description:
-----------------------
- Added a new "BlockingSerialExecutor" class which basically contains most of
the logic that is currently done in the enqueueCallback, waitForCallbackLock,
execute and releaseCallback methods: this class ensures that a single leader
thread is executing a given task, and other threads are blocked until the
leader thread is done. Additionally, I had to allow this new executor to be
reentrant: when one executing task reschedule a new sub-task: then we need to
execute the subtask immediately. This is required for instance when the
serialized "addedService(ServiceReference ref, Object service)" method calls
"ds.dependencyAvailable(this)", which then calls the serialized
"invokeAdded(final DependencyService service)" method ... see the patch for
more details.
I added this new class because I think we need to reuse it in other kind of
dependencies, which may have also some concurrency issues.
- I then reworked a little the ServiceDependencyImpl class like this:
* all public methods are regrouped in the beginning of the class file, then
comes the protected methods, and then the private methods
* serialized some public methods, in order to fix the race conditions.
* made protected the getService() method, which also impacts some minor modifs
in the TemporalServiceDependencyImpl class.
* made volatile or final some attributes in the ServiceDependencyImpl /
ComponentImpl / TemporalServiceDependencyImpl classes: Using pax-exam 3.0.0 (in
test2), I came across some subtle problems regarding some attributes which are
modified from a synchronized block, but read from unsynchronized blocks. So, to
fix the problem, I had to use volatile or final ...
* did a small patch in the ServiceDependencyImpl.makeUnavailable method:
private synchronized boolean makeUnavailable() {
if ((isAvailable()) && (m_isStarted == false ||
!m_tracker.hasReference())) {
m_isAvailable = false;
return true;
}
return false;
}
Here, the "m_isStarted" flag must be checked in order to avoid a potential NPE,
when m_tracker is null. The m_tracker can be null when many service
dependencies are unregistered concurrently (it's tricky and difficult to
explain ...).
Please notice that since most of the public methods in the
ServiceDependencyImpl class are now serialized, it's now making sense to remove
many synchronized blocks.
However, I did not do it because I prefer to play safe for now, and we can
probably do it in a next step. I prefer for now to minimize code modifications.
The only synchronized blocks I had to comment are in the
handleAspectAwareAdded/handleAspectAwareRemoved methods: Indeed, these methods
are now fully serialized; and since they invoke swap callbacks, I prefer to
not hold any locks during swap callback invocations.
Let me know if this is not breaking anything from your side and if you are ok
for a commit.
thanks;
> Race conditions in DependencyManager
> ------------------------------------
>
> Key: FELIX-3910
> URL: https://issues.apache.org/jira/browse/FELIX-3910
> Project: Felix
> Issue Type: Bug
> Components: Dependency Manager
> Affects Versions: dependencymanager-3.0.0
> Environment: jdk1.6, jdk1.7, linux fc16
> Reporter: Pierre De Rop
> Attachments: FELIX-3910-patch, FELIX-3910.patch.2,
> FELIX-3910.patch.3, FELIX-3910.patch.4, FELIX-3910.patch.5.tgz
>
>
> In a multi threaded context, where dependencies are injected concurrently
> from different threads, we came across some exceptions which seem to take
> place from dependencymanager.
> I have tried to reproduce the problems using a paxexam test which I will
> commit.
> Not all exceptions are reproduced by the test case, but I think that the
> testcase really reproduces a problem.
> I also have a candidate patch, which I will submit to this jira issue.
> Here are the exceptions we have seen:
> first stacktrace seen:
> ===============
> ERROR: Bundle test.dm [21] EventDispatcher: Error during dispatch.
> (java.lang.NullPointerException)
> java.lang.NullPointerException
> at
> org.apache.felix.dm.impl.dependencies.ServiceDependencyImpl.addedService(ServiceDependencyImpl.java:481)
> at
> org.apache.felix.dm.tracker.ServiceTracker$Tracked.customizerAdded(ServiceTracker.java:1325)
> at
> org.apache.felix.dm.tracker.AbstractTracked.trackAdding(AbstractTracked.java:290)
> at
> org.apache.felix.dm.tracker.AbstractTracked.track(AbstractTracked.java:236)
> at
> org.apache.felix.dm.tracker.ServiceTracker$Tracked.serviceChangedHideAspects(ServiceTracker.java:1206)
> at
> org.apache.felix.dm.tracker.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1101)
> at
> org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:932)
> at
> org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:793)
> at
> org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:543)
> at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4260)
> at org.apache.felix.framework.Felix.registerService(Felix.java:3275)
> at
> org.apache.felix.framework.BundleContextImpl.registerService(BundleContextImpl.java:346)
> at
> org.apache.felix.framework.BundleContextImpl.registerService(BundleContextImpl.java:320)
> at test.dm.race.RaceTest$RegistrationHelper$1.run(RaceTest.java:104)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> second exceptions:
> ==============
> ERROR: EventDispatcher: Error during dispatch.
> (java.lang.NullPointerException)
> java.lang.NullPointerException
> at
> org.apache.felix.dm.impl.ComponentImpl.invokeCallbackMethod(ComponentImpl.java:686)
> at
> org.apache.felix.dm.impl.dependencies.ServiceDependencyImpl.invoke(ServiceDependencyImpl.java:704)
> at
> org.apache.felix.dm.impl.dependencies.ServiceDependencyImpl.invokeRemoved(ServiceDependencyImpl.java:666)
> at
> org.apache.felix.dm.impl.dependencies.ServiceDependencyImpl.removedService(ServiceDependencyImpl.java:520)
> at
> org.apache.felix.dm.tracker.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1351)
> at
> org.apache.felix.dm.tracker.AbstractTracked.untrack(AbstractTracked.java:359)
> at
> org.apache.felix.dm.tracker.ServiceTracker$Tracked.serviceChangedHideAspects(ServiceTracker.java:1285)
> at
> org.apache.felix.dm.tracker.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1101)
> at
> org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:878)
> at
> org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:732)
> at
> org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:662)
> at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:3587)
> at org.apache.felix.framework.Felix.access$000(Felix.java:40)
> at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:625)
> at
> org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:117)
> at
> org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:128)
> at org.apache.felix.dm.test.RaceTest$AFactory$2.run(RaceTest.java:151)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
--
This message was sent by Atlassian JIRA
(v6.1#6144)