[ 
https://issues.apache.org/jira/browse/FELIX-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791366#comment-13791366
 ] 

Pierre De Rop commented on FELIX-3910:
--------------------------------------

I have attached the patch candidate (FELIX-3910.patch.5.tgz).

can you please check if it works in your environment ?
All tests are passing, except the two following tests:

1) 
testDynamicComponentStateListingLifeCycle2(org.apache.felix.dm.test.ComponentLifeCycleTest)

I think this is a pending issue, and the test also fails without the patch

2)   
testServiceRegistrationAndConsumption(org.apache.felix.dm.test.AspectAwareServiceDependencyTest)

This test also fails without the patch: I looked into it quickly and think that 
the pb might come from the dependency of the "ServiceConsumerCallbacks" 
component, which is optional and the swap method is called twice (the second 
swap call occurs when the "ServiceConsumerCallbacks" component is removed 
because the dependency is optional ... but this can be discussed in another 
jira issue.

patch description:
-----------------------

- Added a new "BlockingSerialExecutor" class which basically contains most of 
the logic that is currently done in the enqueueCallback, waitForCallbackLock, 
execute and releaseCallback methods: this class ensures that a single leader 
thread is executing a given task, and other threads are blocked until the 
leader thread is done. Additionally, I had to allow this new executor to be 
reentrant: when one executing task reschedule a new sub-task: then we need to 
execute the subtask immediately. This is required for instance when the 
serialized "addedService(ServiceReference ref, Object service)" method calls 
"ds.dependencyAvailable(this)", which then calls the serialized 
"invokeAdded(final DependencyService service)" method ... see the patch for 
more details.

I added this new class because I think we need to reuse it in other kind of 
dependencies, which may have also some concurrency issues.

- I then reworked a little the ServiceDependencyImpl class like this:

* all public methods are regrouped in the beginning of the class file, then 
comes the protected methods, and then the private methods

* serialized some public methods, in order to fix the race conditions.

* made protected the getService() method, which also impacts some minor modifs 
in the TemporalServiceDependencyImpl class.

* made volatile or final some attributes in the ServiceDependencyImpl / 
ComponentImpl / TemporalServiceDependencyImpl classes: Using pax-exam 3.0.0 (in 
test2), I came across some subtle problems regarding some attributes which are 
modified from a synchronized block, but read from unsynchronized blocks. So, to 
fix the problem, I had to use volatile or final ...

* did a small patch in the ServiceDependencyImpl.makeUnavailable method: 

    private synchronized boolean makeUnavailable() {
        if ((isAvailable()) && (m_isStarted == false || 
!m_tracker.hasReference())) {
            m_isAvailable = false;
            return true;
        }
        return false;
    }

Here, the "m_isStarted" flag must be checked in order to avoid a potential NPE, 
when m_tracker is null. The m_tracker can be null when many service 
dependencies are unregistered concurrently (it's tricky and difficult to 
explain ...).

Please notice that since most of the public methods in the 
ServiceDependencyImpl class are now serialized, it's now making sense to remove 
many synchronized blocks. 
However, I did not do it because I prefer to play safe for now, and we can 
probably do it in a next step. I prefer for now to minimize code modifications. 
The only synchronized blocks I had to comment are in the  
handleAspectAwareAdded/handleAspectAwareRemoved methods: Indeed, these methods 
are now fully serialized; and since they invoke swap callbacks,  I prefer  to 
not hold any locks during swap callback invocations.

Let me know if this is not breaking anything from your side and if you are ok 
for a commit.
thanks;




> Race conditions in DependencyManager
> ------------------------------------
>
>                 Key: FELIX-3910
>                 URL: https://issues.apache.org/jira/browse/FELIX-3910
>             Project: Felix
>          Issue Type: Bug
>          Components: Dependency Manager
>    Affects Versions: dependencymanager-3.0.0
>         Environment: jdk1.6, jdk1.7, linux fc16
>            Reporter: Pierre De Rop
>         Attachments: FELIX-3910-patch, FELIX-3910.patch.2, 
> FELIX-3910.patch.3, FELIX-3910.patch.4, FELIX-3910.patch.5.tgz
>
>
> In a multi threaded context, where dependencies are injected concurrently 
> from different threads, we  came across some exceptions which seem to take 
> place from dependencymanager.
> I have tried to reproduce the problems using a paxexam test which I will 
> commit.
> Not all exceptions are reproduced by the test case, but I think that the 
> testcase really reproduces a problem. 
> I also have a candidate patch, which I will submit to this jira issue.
> Here are the exceptions we have seen:
> first stacktrace seen:
> ===============
> ERROR: Bundle test.dm [21] EventDispatcher: Error during dispatch. 
> (java.lang.NullPointerException)
> java.lang.NullPointerException
>         at 
> org.apache.felix.dm.impl.dependencies.ServiceDependencyImpl.addedService(ServiceDependencyImpl.java:481)
>         at 
> org.apache.felix.dm.tracker.ServiceTracker$Tracked.customizerAdded(ServiceTracker.java:1325)
>         at 
> org.apache.felix.dm.tracker.AbstractTracked.trackAdding(AbstractTracked.java:290)
>         at 
> org.apache.felix.dm.tracker.AbstractTracked.track(AbstractTracked.java:236)
>         at 
> org.apache.felix.dm.tracker.ServiceTracker$Tracked.serviceChangedHideAspects(ServiceTracker.java:1206)
>         at 
> org.apache.felix.dm.tracker.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1101)
>         at 
> org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:932)
>         at 
> org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:793)
>         at 
> org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:543)
>         at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4260)
>         at org.apache.felix.framework.Felix.registerService(Felix.java:3275)
>         at 
> org.apache.felix.framework.BundleContextImpl.registerService(BundleContextImpl.java:346)
>         at 
> org.apache.felix.framework.BundleContextImpl.registerService(BundleContextImpl.java:320)
>         at test.dm.race.RaceTest$RegistrationHelper$1.run(RaceTest.java:104)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> second exceptions:
> ==============
> ERROR: EventDispatcher: Error during dispatch. 
> (java.lang.NullPointerException)
> java.lang.NullPointerException
>         at 
> org.apache.felix.dm.impl.ComponentImpl.invokeCallbackMethod(ComponentImpl.java:686)
>         at 
> org.apache.felix.dm.impl.dependencies.ServiceDependencyImpl.invoke(ServiceDependencyImpl.java:704)
>         at 
> org.apache.felix.dm.impl.dependencies.ServiceDependencyImpl.invokeRemoved(ServiceDependencyImpl.java:666)
>         at 
> org.apache.felix.dm.impl.dependencies.ServiceDependencyImpl.removedService(ServiceDependencyImpl.java:520)
>         at 
> org.apache.felix.dm.tracker.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1351)
>         at 
> org.apache.felix.dm.tracker.AbstractTracked.untrack(AbstractTracked.java:359)
>         at 
> org.apache.felix.dm.tracker.ServiceTracker$Tracked.serviceChangedHideAspects(ServiceTracker.java:1285)
>         at 
> org.apache.felix.dm.tracker.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1101)
>         at 
> org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:878)
>         at 
> org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:732)
>         at 
> org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:662)
>         at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:3587)
>         at org.apache.felix.framework.Felix.access$000(Felix.java:40)
>         at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:625)
>         at 
> org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:117)
>         at 
> org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:128)
>         at org.apache.felix.dm.test.RaceTest$AFactory$2.run(RaceTest.java:151)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to