Pierre De Rop created FELIX-5716:
------------------------------------

             Summary: Dead Lock in DM
                 Key: FELIX-5716
                 URL: https://issues.apache.org/jira/browse/FELIX-5716
             Project: Felix
          Issue Type: Bug
    Affects Versions: org.apache.felix.dependencymanager-r1
            Reporter: Pierre De Rop
            Assignee: Pierre De Rop
            Priority: Critical


I just found an unfortunate deadlock when using latest DM r11 and latest SCR 
2.0.12 in the same JVM.

I believe that the same issue applies to both DM and SCR (I may have to open a 
seperate issue for SCR).

First, here is the deadlock:

{code}
Found one Java-level deadlock:
=============================
"CM Event Dispatcher (Fire ConfigurationEvent: 
pid=com.alcatel.as.http.ioh.impl.HttpIOH)":
  waiting to lock monitor 0x00007f8188004538 (object 0x00000000eb38d420, a 
org.apache.felix.dm.tracker.ServiceTracker$Tracked),
  which is held by "FelixDispatchQueue"
"FelixDispatchQueue":
  waiting for ownable synchronizer 0x00000000c0699f30, (a 
java.util.concurrent.locks.ReentrantLock$FairSync),
  which is held by "CM Event Dispatcher (Fire ConfigurationEvent: 
pid=com.alcatel.as.http.ioh.impl.HttpIOH)"

Java stack information for the threads listed above:
===================================================
"CM Event Dispatcher (Fire ConfigurationEvent: 
pid=com.alcatel.as.http.ioh.impl.HttpIOH)":
        at 
org.apache.felix.dm.tracker.ServiceTracker$Tracked.serviceChangedHideAspects(ServiceTracker.java:1140)
        - waiting to lock <0x00000000eb38d420> (a 
org.apache.felix.dm.tracker.ServiceTracker$Tracked)
        at 
org.apache.felix.dm.tracker.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1054)
        at 
org.apache.felix.framework.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:990)
        at 
org.apache.felix.framework.EventDispatcher.fireEventImmediately(EventDispatcher.java:838)
        at 
org.apache.felix.framework.EventDispatcher.fireServiceEvent(EventDispatcher.java:545)
        at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4579)
        at org.apache.felix.framework.Felix.access$000(Felix.java:105)
        at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:419)
        at 
org.apache.felix.framework.ServiceRegistry.servicePropertiesModified(ServiceRegistry.java:588)
        at 
org.apache.felix.framework.ServiceRegistrationImpl.setProperties(ServiceRegistrationImpl.java:131)
        at 
org.apache.felix.scr.impl.manager.SingleComponentManager.updateServiceRegistration(SingleComponentManager.java:558)
        at 
org.apache.felix.scr.impl.manager.SingleComponentManager.modify(SingleComponentManager.java:755)
        at 
org.apache.felix.scr.impl.manager.SingleComponentManager.reconfigure(SingleComponentManager.java:645)
        at 
org.apache.felix.scr.impl.manager.SingleComponentManager.reconfigure(SingleComponentManager.java:609)
        at 
org.apache.felix.scr.impl.manager.ConfigurableComponentHolder.configurationUpdated(ConfigurableComponentHolder.java:426)
        at 
org.apache.felix.scr.impl.manager.RegionConfigurationSupport.configurationEvent(RegionConfigurationSupport.java:284)
        at 
org.apache.felix.scr.impl.manager.RegionConfigurationSupport$1.configurationEvent(RegionConfigurationSupport.java:89)
        at 
org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2090)
        at 
org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2058)
        at org.apache.felix.cm.impl.UpdateThread.run0(UpdateThread.java:141)
        at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:109)
        at java.lang.Thread.run(Thread.java:748)
"FelixDispatchQueue":
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000c0699f30> (a 
java.util.concurrent.locks.ReentrantLock$FairSync)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)
        at 
java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:442)
        at 
org.apache.felix.scr.impl.manager.AbstractComponentManager.obtainLock(AbstractComponentManager.java:228)
        at 
org.apache.felix.scr.impl.manager.AbstractComponentManager.obtainStateLock(AbstractComponentManager.java:279)
        at 
org.apache.felix.scr.impl.manager.SingleComponentManager.getService(SingleComponentManager.java:806)
        at 
org.apache.felix.framework.ServiceRegistrationImpl.getFactoryUnchecked(ServiceRegistrationImpl.java:347)
        at 
org.apache.felix.framework.ServiceRegistrationImpl.getService(ServiceRegistrationImpl.java:247)
        at 
org.apache.felix.framework.ServiceRegistry.getService(ServiceRegistry.java:350)
        at org.apache.felix.framework.Felix.getService(Felix.java:3721)
        at 
org.apache.felix.framework.BundleContextImpl.getService(BundleContextImpl.java:470)
        at 
org.apache.felix.dm.impl.ServiceEventImpl.getEvent(ServiceEventImpl.java:86)
        at 
org.apache.felix.dm.impl.ServiceDependencyImpl.addingService(ServiceDependencyImpl.java:251)
        at 
org.apache.felix.dm.tracker.ServiceTracker$Tracked.customizerAdding(ServiceTracker.java:1276)
        at 
org.apache.felix.dm.tracker.AbstractTracked.trackAdding(AbstractTracked.java:279)
        at 
org.apache.felix.dm.tracker.AbstractTracked.trackInitial(AbstractTracked.java:188)
        - locked <0x00000000eb38d420> (a 
org.apache.felix.dm.tracker.ServiceTracker$Tracked)
        at 
org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:387)
        - locked <0x00000000eb38d420> (a 
org.apache.felix.dm.tracker.ServiceTracker$Tracked)
        - locked <0x00000000eb38cd00> (a 
org.apache.felix.dm.tracker.ServiceTracker)
        at 
org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:320)
        at 
org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:296)
        at 
org.apache.felix.dm.impl.ServiceDependencyImpl.start(ServiceDependencyImpl.java:235)
        at 
org.apache.felix.dm.impl.ComponentImpl.startDependencies(ComponentImpl.java:1360)
        at 
org.apache.felix.dm.impl.ComponentImpl.performTransition(ComponentImpl.java:1059)
        at 
org.apache.felix.dm.impl.ComponentImpl.handleChange(ComponentImpl.java:1010)
        at 
org.apache.felix.dm.impl.ComponentImpl.lambda$start$2(ComponentImpl.java:426)
        at 
org.apache.felix.dm.impl.ComponentImpl$$Lambda$7/201251057.run(Unknown Source)
        at 
org.apache.felix.dm.impl.SerialExecutor.runTask(SerialExecutor.java:138)
        at 
org.apache.felix.dm.impl.SerialExecutor.runTasks(SerialExecutor.java:120)
        at 
org.apache.felix.dm.impl.SerialExecutor.execute(SerialExecutor.java:86)
        at 
org.apache.felix.dm.impl.SerialExecutor.execute(SerialExecutor.java:105)
        at org.apache.felix.dm.impl.ComponentImpl.start(ComponentImpl.java:424)
        at 
org.apache.felix.dm.impl.ComponentScheduler.add(ComponentScheduler.java:69)
        at org.apache.felix.dm.DependencyManager.add(DependencyManager.java:136)
        at 
com.alcatel.as.service.coordinator.impl.Activator.init(Activator.java:16)
        at 
org.apache.felix.dm.DependencyActivatorBase.start(DependencyActivatorBase.java:75)
        at 
org.apache.felix.framework.util.SecureAction.startActivator(SecureAction.java:697)
        at org.apache.felix.framework.Felix.activateBundle(Felix.java:2239)
        at org.apache.felix.framework.Felix.startBundle(Felix.java:2145)
        at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:998)
        at 
com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl$DeployedBundle.start(BundleInstallerImpl.java:1056)
        at 
com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.startBundles(BundleInstallerImpl.java:728)
        at 
com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.finishInitialisation(BundleInstallerImpl.java:230)
        at 
com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.frameworkEvent(BundleInstallerImpl.java:213)
        at 
com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.start(BundleInstallerImpl.java:174)
        at 
org.apache.felix.framework.util.SecureAction.startActivator(SecureAction.java:697)
        at org.apache.felix.framework.Felix.activateBundle(Felix.java:2239)
        at org.apache.felix.framework.Felix.startBundle(Felix.java:2145)
        at org.apache.felix.framework.Felix.updateBundle(Felix.java:2506)
        at org.apache.felix.framework.BundleImpl.update(BundleImpl.java:1018)
        at org.apache.felix.framework.BundleImpl.update(BundleImpl.java:1004)
        at 
com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.checkSelfUpdate(BundleInstallerImpl.java:273)
        at 
com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.frameworkEvent(BundleInstallerImpl.java:192)
        at 
org.apache.felix.framework.EventDispatcher.invokeFrameworkListenerCallback(EventDispatcher.java:881)
        at 
org.apache.felix.framework.EventDispatcher.fireEventImmediately(EventDispatcher.java:830)
        at 
org.apache.felix.framework.EventDispatcher.run(EventDispatcher.java:1147)
        at 
org.apache.felix.framework.EventDispatcher.access$000(EventDispatcher.java:54)
        at 
org.apache.felix.framework.EventDispatcher$1.run(EventDispatcher.java:102)
        at java.lang.Thread.run(Thread.java:748)

Found 1 deadlock.
{code}

now, here is my first interpretation of what is going on:

in SCR:
=====

Configuration Admin is delivering a configuration update event to SCR. So, the 
org.apache.felix.scr.impl.manager.RegionConfigurationSupport.configurationEvent()
 method is then called. This method then indirectly triggers a call to the 
SingleComponentManager.reconfigure method, which then calls 
SingleComponentManager.modify method. Here is the modify() method:

{code}
    private boolean modify(boolean configurationDeleted)
    {
        //0 SCR 112.7.1 If configuration is deleted, and version is < 1.3 and 
no flag set, then deactivate unconditionally.
        // For version 1.3 and later, or with a flag, more sensible behavior is 
allowed.
        if ( configurationDeleted && 
!getComponentMetadata().isDeleteCallsModify()){
            return false;
        }

        // 1. no live update if there is no declared method
        if ( getComponentMetadata().getModified() == null )
        {
            log( LogService.LOG_DEBUG, "No modified method, cannot update 
dynamically", null );
            return false;
        }
        // invariant: we have a modified method name

        // 2. get and check configured method
        // invariant: modify method is configured and found

        // 3. check whether we can dynamically apply the configuration if
        // any target filters influence the bound services
        final Map<String, Object> props = getProperties();
        for ( DependencyManager<S, ?> dm: getDependencyManagers() )
        {
            if ( !dm.canUpdateDynamically( props ) )
            {
                log( LogService.LOG_DEBUG,
                        "Cannot dynamically update the configuration due to 
dependency changes induced on dependency {0}",
                        new Object[] {dm.getName()}, null );
                return false;
            }
        }
        // invariant: modify method existing and no static bound service changes

        // 4. call method (nothing to do when failed, since it has already been 
logged)
        //   (call with non-null default result to continue even if the
        //    modify method call failed)
        obtainStateLock(  );
        try
        {
            //cf 112.5.12 where invoking modified method before updating target 
services is specified.
            final MethodResult result = invokeModifiedMethod();
            updateTargets( props );
            if ( result == null )
            {
                // log an error if the declared method cannot be found
                log( LogService.LOG_ERROR, "Declared modify method ''{0}'' 
cannot be found, configuring by reactivation",
                        new Object[] {getComponentMetadata().getModified()}, 
null );
                return false;
            }

            // 5. update the target filter on the services now, this may still
            // result in unsatisfied dependencies, in which case we abort
            // this dynamic update and have the component be deactivated
            if ( !verifyDependencyManagers() )
            {
                log( LogService.LOG_DEBUG,
                        "Updating the service references caused at least one 
reference to become unsatisfied, deactivating component",
                        null );
                return false;
            }

            // 6. update service registration properties if we didn't just do it
            if ( result.hasResult() )
            {
                setServiceProperties( result, null );
            }
            else
            {
                updateServiceRegistration();
            }
     ...
{code}

This is problematic because we see that the modify method is invoking the 
updateServiceRegistration while holding the StateLock. (the obtainStateLock() 
method is called and then the service registration is updaded). 

Here, the listener is a DM component, but this component is internally 
synchronized on it's service tracker, and is currently blocked, trying to 
dereference the DS service, but it can't because the DS service state lock is 
currently held by the modify method.

in the next section, I'm now analyzing what is doing DM and will propose a 
patch in order to make sure DM does not hold any locks while calling 
bundleContext.getService(ref) ... this will fix the issue.

Dependency Manager
=================

In dependency manager, now let's analyze the the FelixDispatchQueue stacktrace: 
we open the DM service tracker, but the open method (in the DM service tracker) 
calls the AbstractTracked.trackInitial while holding the lock on 
ServiceTracker$Tracked. Then this triggers the customizer "addingService" 
method call, which then perform a getService. But the getService then tries to 
dereference a declarative service component, and the stacktrace is then blocked 
because the DS component state lock is currently locked by the other "CM Event 
Dispatcher"
So, in the end, I think that there is something wrong in both SCR and DM :

1) SCR obtain the state lock while update the service registration.
This is , I think, unfortunate because updating the service registration while 
holding a lock is like hodling a lock while calling foreigh code (you don't 
know what will do the service listener when you update the service 
registration).

2) and DM calls the customizer "addingService" method while holding the lock on 
the service tracker. This is also not satisfying.

So, at minimul, in DM I do believe that we should manage to not invoke the 
customizer.addingService while holding the lock on the DM service tracker.

now, I have given a look into the DM ServiceTracker.open method and it is 
currently implemented like this:

{code}
        public void open(boolean trackAllServices, boolean trackAllAspects) {
                if (debug) {
                        System.out.println("[ServiceTracker] " + debugKey + " 
T" + Thread.currentThread().getId() + " open");
                }
                final Tracked t;
                synchronized (this) {
                        if (tracked != null) {
                                return;
                        }
                        if (DEBUG) {
                                System.out.println("ServiceTracker.open: " + 
filter); 
                        }
                        m_trackAllAspects = trackAllAspects;
                        t = trackAllServices ? new AllTracked() : new Tracked();
                        synchronized (t) {
                                try {
                                        context.addServiceListener(t, 
listenerFilter);
                                        ServiceReference[] references = null;
                                        if (trackClass != null) {
                                                references = 
getInitialReferences(trackAllServices,
                                                                trackClass, 
null);
                                        }
                                        else {
                                                if (trackReference != null) {
                                                        if 
(trackReference.getBundle() != null) {
                                                                references = 
new ServiceReference[] {trackReference};
                                                        }
                                                }
                                                else { /* user supplied filter 
*/
                                                        references = 
getInitialReferences(trackAllServices,
                                                                        null,
                                                                        
(listenerFilter != null) ? listenerFilter
                                                                                
        : filter.toString());
                                                }
                                        }
                                        /* set tracked with the initial 
references */
                                        t.setInitial(references);
                                        
                                        // only actually schedules the actions 
for execution within this synchronized block,
                                        // but do the actual execution 
afterwards.
                                        t.trackInitial(); 

                                }
                                catch (InvalidSyntaxException e) {
                                        throw new RuntimeException(
                                                        "unexpected 
InvalidSyntaxException: "
                                                                        + 
e.getMessage(), e); 
                                }
                        }
                        tracked = t;
                }
                /* Call tracked outside of synchronized region */
                // just trigger the executor
                t.getExecutor().execute();
        }
{code}

As you can see the "t.trackInitial()" is invoked from within the synchronized 
block. This method actually schedules customizer "added" callbacks, which will 
be invoked properly outside of the synchronized block  (when 
t.getExecutor().executor() will be called).
But please notice that the the issue is that the t.trackInitial() will also 
invoke the "addingService" customizer callback from the synchronized block. 
Doing so will end up dereferencing the DS service reference while we are 
holding the ServiceTracker lock.

Interestingly, notice that the BundleTracker is not calling the 
t.trackInitial() method from the synchronized block.

so, I have attempted to make a patch, which seems to resolve the issue. All 
tests are passing OK, and we don't experience the deadlock anymore, and I would 
appreciate if another DM developer or user could take a look at my proposed 
patch, to make sure I don't have messed up :-)

thanks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to