[ 
https://issues.apache.org/jira/browse/FELIX-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre De Rop updated FELIX-5716:
---------------------------------
    Attachment: deadlock.proposed.patch.txt

Attaching the proposed patch.

> Dead Lock in DM
> ---------------
>
>                 Key: FELIX-5716
>                 URL: https://issues.apache.org/jira/browse/FELIX-5716
>             Project: Felix
>          Issue Type: Bug
>    Affects Versions: org.apache.felix.dependencymanager-r1
>            Reporter: Pierre De Rop
>            Assignee: Pierre De Rop
>            Priority: Critical
>         Attachments: deadlock.proposed.patch.txt
>
>
> I just found an unfortunate deadlock when using latest DM r11 and latest SCR 
> 2.0.12 in the same JVM.
> I believe that the same issue applies to both DM and SCR (I may have to open 
> a seperate issue for SCR).
> First, here is the deadlock:
> {code}
> Found one Java-level deadlock:
> =============================
> "CM Event Dispatcher (Fire ConfigurationEvent: 
> pid=com.alcatel.as.http.ioh.impl.HttpIOH)":
>   waiting to lock monitor 0x00007f8188004538 (object 0x00000000eb38d420, a 
> org.apache.felix.dm.tracker.ServiceTracker$Tracked),
>   which is held by "FelixDispatchQueue"
> "FelixDispatchQueue":
>   waiting for ownable synchronizer 0x00000000c0699f30, (a 
> java.util.concurrent.locks.ReentrantLock$FairSync),
>   which is held by "CM Event Dispatcher (Fire ConfigurationEvent: 
> pid=com.alcatel.as.http.ioh.impl.HttpIOH)"
> Java stack information for the threads listed above:
> ===================================================
> "CM Event Dispatcher (Fire ConfigurationEvent: 
> pid=com.alcatel.as.http.ioh.impl.HttpIOH)":
>         at 
> org.apache.felix.dm.tracker.ServiceTracker$Tracked.serviceChangedHideAspects(ServiceTracker.java:1140)
>         - waiting to lock <0x00000000eb38d420> (a 
> org.apache.felix.dm.tracker.ServiceTracker$Tracked)
>         at 
> org.apache.felix.dm.tracker.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1054)
>         at 
> org.apache.felix.framework.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:990)
>         at 
> org.apache.felix.framework.EventDispatcher.fireEventImmediately(EventDispatcher.java:838)
>         at 
> org.apache.felix.framework.EventDispatcher.fireServiceEvent(EventDispatcher.java:545)
>         at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4579)
>         at org.apache.felix.framework.Felix.access$000(Felix.java:105)
>         at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:419)
>         at 
> org.apache.felix.framework.ServiceRegistry.servicePropertiesModified(ServiceRegistry.java:588)
>         at 
> org.apache.felix.framework.ServiceRegistrationImpl.setProperties(ServiceRegistrationImpl.java:131)
>         at 
> org.apache.felix.scr.impl.manager.SingleComponentManager.updateServiceRegistration(SingleComponentManager.java:558)
>         at 
> org.apache.felix.scr.impl.manager.SingleComponentManager.modify(SingleComponentManager.java:755)
>         at 
> org.apache.felix.scr.impl.manager.SingleComponentManager.reconfigure(SingleComponentManager.java:645)
>         at 
> org.apache.felix.scr.impl.manager.SingleComponentManager.reconfigure(SingleComponentManager.java:609)
>         at 
> org.apache.felix.scr.impl.manager.ConfigurableComponentHolder.configurationUpdated(ConfigurableComponentHolder.java:426)
>         at 
> org.apache.felix.scr.impl.manager.RegionConfigurationSupport.configurationEvent(RegionConfigurationSupport.java:284)
>         at 
> org.apache.felix.scr.impl.manager.RegionConfigurationSupport$1.configurationEvent(RegionConfigurationSupport.java:89)
>         at 
> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2090)
>         at 
> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2058)
>         at org.apache.felix.cm.impl.UpdateThread.run0(UpdateThread.java:141)
>         at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:109)
>         at java.lang.Thread.run(Thread.java:748)
> "FelixDispatchQueue":
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000c0699f30> (a 
> java.util.concurrent.locks.ReentrantLock$FairSync)
>         at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)
>         at 
> java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:442)
>         at 
> org.apache.felix.scr.impl.manager.AbstractComponentManager.obtainLock(AbstractComponentManager.java:228)
>         at 
> org.apache.felix.scr.impl.manager.AbstractComponentManager.obtainStateLock(AbstractComponentManager.java:279)
>         at 
> org.apache.felix.scr.impl.manager.SingleComponentManager.getService(SingleComponentManager.java:806)
>         at 
> org.apache.felix.framework.ServiceRegistrationImpl.getFactoryUnchecked(ServiceRegistrationImpl.java:347)
>         at 
> org.apache.felix.framework.ServiceRegistrationImpl.getService(ServiceRegistrationImpl.java:247)
>         at 
> org.apache.felix.framework.ServiceRegistry.getService(ServiceRegistry.java:350)
>         at org.apache.felix.framework.Felix.getService(Felix.java:3721)
>         at 
> org.apache.felix.framework.BundleContextImpl.getService(BundleContextImpl.java:470)
>         at 
> org.apache.felix.dm.impl.ServiceEventImpl.getEvent(ServiceEventImpl.java:86)
>         at 
> org.apache.felix.dm.impl.ServiceDependencyImpl.addingService(ServiceDependencyImpl.java:251)
>         at 
> org.apache.felix.dm.tracker.ServiceTracker$Tracked.customizerAdding(ServiceTracker.java:1276)
>         at 
> org.apache.felix.dm.tracker.AbstractTracked.trackAdding(AbstractTracked.java:279)
>         at 
> org.apache.felix.dm.tracker.AbstractTracked.trackInitial(AbstractTracked.java:188)
>         - locked <0x00000000eb38d420> (a 
> org.apache.felix.dm.tracker.ServiceTracker$Tracked)
>         at 
> org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:387)
>         - locked <0x00000000eb38d420> (a 
> org.apache.felix.dm.tracker.ServiceTracker$Tracked)
>         - locked <0x00000000eb38cd00> (a 
> org.apache.felix.dm.tracker.ServiceTracker)
>         at 
> org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:320)
>         at 
> org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:296)
>         at 
> org.apache.felix.dm.impl.ServiceDependencyImpl.start(ServiceDependencyImpl.java:235)
>         at 
> org.apache.felix.dm.impl.ComponentImpl.startDependencies(ComponentImpl.java:1360)
>         at 
> org.apache.felix.dm.impl.ComponentImpl.performTransition(ComponentImpl.java:1059)
>         at 
> org.apache.felix.dm.impl.ComponentImpl.handleChange(ComponentImpl.java:1010)
>         at 
> org.apache.felix.dm.impl.ComponentImpl.lambda$start$2(ComponentImpl.java:426)
>         at 
> org.apache.felix.dm.impl.ComponentImpl$$Lambda$7/201251057.run(Unknown Source)
>         at 
> org.apache.felix.dm.impl.SerialExecutor.runTask(SerialExecutor.java:138)
>         at 
> org.apache.felix.dm.impl.SerialExecutor.runTasks(SerialExecutor.java:120)
>         at 
> org.apache.felix.dm.impl.SerialExecutor.execute(SerialExecutor.java:86)
>         at 
> org.apache.felix.dm.impl.SerialExecutor.execute(SerialExecutor.java:105)
>         at 
> org.apache.felix.dm.impl.ComponentImpl.start(ComponentImpl.java:424)
>         at 
> org.apache.felix.dm.impl.ComponentScheduler.add(ComponentScheduler.java:69)
>         at 
> org.apache.felix.dm.DependencyManager.add(DependencyManager.java:136)
>         at 
> com.alcatel.as.service.coordinator.impl.Activator.init(Activator.java:16)
>         at 
> org.apache.felix.dm.DependencyActivatorBase.start(DependencyActivatorBase.java:75)
>         at 
> org.apache.felix.framework.util.SecureAction.startActivator(SecureAction.java:697)
>         at org.apache.felix.framework.Felix.activateBundle(Felix.java:2239)
>         at org.apache.felix.framework.Felix.startBundle(Felix.java:2145)
>         at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:998)
>         at 
> com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl$DeployedBundle.start(BundleInstallerImpl.java:1056)
>         at 
> com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.startBundles(BundleInstallerImpl.java:728)
>         at 
> com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.finishInitialisation(BundleInstallerImpl.java:230)
>         at 
> com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.frameworkEvent(BundleInstallerImpl.java:213)
>         at 
> com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.start(BundleInstallerImpl.java:174)
>         at 
> org.apache.felix.framework.util.SecureAction.startActivator(SecureAction.java:697)
>         at org.apache.felix.framework.Felix.activateBundle(Felix.java:2239)
>         at org.apache.felix.framework.Felix.startBundle(Felix.java:2145)
>         at org.apache.felix.framework.Felix.updateBundle(Felix.java:2506)
>         at org.apache.felix.framework.BundleImpl.update(BundleImpl.java:1018)
>         at org.apache.felix.framework.BundleImpl.update(BundleImpl.java:1004)
>         at 
> com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.checkSelfUpdate(BundleInstallerImpl.java:273)
>         at 
> com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.frameworkEvent(BundleInstallerImpl.java:192)
>         at 
> org.apache.felix.framework.EventDispatcher.invokeFrameworkListenerCallback(EventDispatcher.java:881)
>         at 
> org.apache.felix.framework.EventDispatcher.fireEventImmediately(EventDispatcher.java:830)
>         at 
> org.apache.felix.framework.EventDispatcher.run(EventDispatcher.java:1147)
>         at 
> org.apache.felix.framework.EventDispatcher.access$000(EventDispatcher.java:54)
>         at 
> org.apache.felix.framework.EventDispatcher$1.run(EventDispatcher.java:102)
>         at java.lang.Thread.run(Thread.java:748)
> Found 1 deadlock.
> {code}
> now, here is my first interpretation of what is going on:
> in SCR:
> =====
> Configuration Admin is delivering a configuration update event to SCR. So, 
> the 
> org.apache.felix.scr.impl.manager.RegionConfigurationSupport.configurationEvent()
>  method is then called. This method then indirectly triggers a call to the 
> SingleComponentManager.reconfigure method, which then calls 
> SingleComponentManager.modify method. Here is the modify() method:
> {code}
>     private boolean modify(boolean configurationDeleted)
>     {
>         //0 SCR 112.7.1 If configuration is deleted, and version is < 1.3 and 
> no flag set, then deactivate unconditionally.
>         // For version 1.3 and later, or with a flag, more sensible behavior 
> is allowed.
>         if ( configurationDeleted && 
> !getComponentMetadata().isDeleteCallsModify()){
>             return false;
>         }
>         // 1. no live update if there is no declared method
>         if ( getComponentMetadata().getModified() == null )
>         {
>             log( LogService.LOG_DEBUG, "No modified method, cannot update 
> dynamically", null );
>             return false;
>         }
>         // invariant: we have a modified method name
>         // 2. get and check configured method
>         // invariant: modify method is configured and found
>         // 3. check whether we can dynamically apply the configuration if
>         // any target filters influence the bound services
>         final Map<String, Object> props = getProperties();
>         for ( DependencyManager<S, ?> dm: getDependencyManagers() )
>         {
>             if ( !dm.canUpdateDynamically( props ) )
>             {
>                 log( LogService.LOG_DEBUG,
>                         "Cannot dynamically update the configuration due to 
> dependency changes induced on dependency {0}",
>                         new Object[] {dm.getName()}, null );
>                 return false;
>             }
>         }
>         // invariant: modify method existing and no static bound service 
> changes
>         // 4. call method (nothing to do when failed, since it has already 
> been logged)
>         //   (call with non-null default result to continue even if the
>         //    modify method call failed)
>         obtainStateLock(  );
>         try
>         {
>             //cf 112.5.12 where invoking modified method before updating 
> target services is specified.
>             final MethodResult result = invokeModifiedMethod();
>             updateTargets( props );
>             if ( result == null )
>             {
>                 // log an error if the declared method cannot be found
>                 log( LogService.LOG_ERROR, "Declared modify method ''{0}'' 
> cannot be found, configuring by reactivation",
>                         new Object[] {getComponentMetadata().getModified()}, 
> null );
>                 return false;
>             }
>             // 5. update the target filter on the services now, this may still
>             // result in unsatisfied dependencies, in which case we abort
>             // this dynamic update and have the component be deactivated
>             if ( !verifyDependencyManagers() )
>             {
>                 log( LogService.LOG_DEBUG,
>                         "Updating the service references caused at least one 
> reference to become unsatisfied, deactivating component",
>                         null );
>                 return false;
>             }
>             // 6. update service registration properties if we didn't just do 
> it
>             if ( result.hasResult() )
>             {
>                 setServiceProperties( result, null );
>             }
>             else
>             {
>                 updateServiceRegistration();
>             }
>      ...
> {code}
> This is problematic because we see that the modify method is invoking the 
> updateServiceRegistration while holding the StateLock. (the obtainStateLock() 
> method is called and then the service registration is updaded). 
> Here, the listener is a DM component, but this component is internally 
> synchronized on it's service tracker, and is currently blocked, trying to 
> dereference the DS service, but it can't because the DS service state lock is 
> currently held by the modify method.
> in the next section, I'm now analyzing what is doing DM and will propose a 
> patch in order to make sure DM does not hold any locks while calling 
> bundleContext.getService(ref) ... this will fix the issue.
> Dependency Manager
> =================
> In dependency manager, now let's analyze the the FelixDispatchQueue 
> stacktrace: we open the DM service tracker, but the open method (in the DM 
> service tracker) calls the AbstractTracked.trackInitial while holding the 
> lock on ServiceTracker$Tracked. Then this triggers the customizer 
> "addingService" method call, which then perform a getService. But the 
> getService then tries to dereference a declarative service component, and the 
> stacktrace is then blocked because the DS component state lock is currently 
> locked by the other "CM Event Dispatcher"
> So, in the end, I think that there is something wrong in both SCR and DM :
> 1) SCR obtain the state lock while update the service registration.
> This is , I think, unfortunate because updating the service registration 
> while holding a lock is like hodling a lock while calling foreigh code (you 
> don't know what will do the service listener when you update the service 
> registration).
> 2) and DM calls the customizer "addingService" method while holding the lock 
> on the service tracker. This is also not satisfying.
> So, at minimul, in DM I do believe that we should manage to not invoke the 
> customizer.addingService while holding the lock on the DM service tracker.
> now, I have given a look into the DM ServiceTracker.open method and it is 
> currently implemented like this:
> {code}
>       public void open(boolean trackAllServices, boolean trackAllAspects) {
>               if (debug) {
>                       System.out.println("[ServiceTracker] " + debugKey + " 
> T" + Thread.currentThread().getId() + " open");
>               }
>               final Tracked t;
>               synchronized (this) {
>                       if (tracked != null) {
>                               return;
>                       }
>                       if (DEBUG) {
>                               System.out.println("ServiceTracker.open: " + 
> filter); 
>                       }
>                       m_trackAllAspects = trackAllAspects;
>                       t = trackAllServices ? new AllTracked() : new Tracked();
>                       synchronized (t) {
>                               try {
>                                       context.addServiceListener(t, 
> listenerFilter);
>                                       ServiceReference[] references = null;
>                                       if (trackClass != null) {
>                                               references = 
> getInitialReferences(trackAllServices,
>                                                               trackClass, 
> null);
>                                       }
>                                       else {
>                                               if (trackReference != null) {
>                                                       if 
> (trackReference.getBundle() != null) {
>                                                               references = 
> new ServiceReference[] {trackReference};
>                                                       }
>                                               }
>                                               else { /* user supplied filter 
> */
>                                                       references = 
> getInitialReferences(trackAllServices,
>                                                                       null,
>                                                                       
> (listenerFilter != null) ? listenerFilter
>                                                                               
>         : filter.toString());
>                                               }
>                                       }
>                                       /* set tracked with the initial 
> references */
>                                       t.setInitial(references);
>                                       
>                                       // only actually schedules the actions 
> for execution within this synchronized block,
>                                       // but do the actual execution 
> afterwards.
>                                       t.trackInitial(); 
>                               }
>                               catch (InvalidSyntaxException e) {
>                                       throw new RuntimeException(
>                                                       "unexpected 
> InvalidSyntaxException: "
>                                                                       + 
> e.getMessage(), e); 
>                               }
>                       }
>                       tracked = t;
>               }
>               /* Call tracked outside of synchronized region */
>               // just trigger the executor
>               t.getExecutor().execute();
>       }
> {code}
> As you can see the "t.trackInitial()" is invoked from within the synchronized 
> block. This method actually schedules customizer "added" callbacks, which 
> will be invoked properly outside of the synchronized block  (when 
> t.getExecutor().executor() will be called).
> But please notice that the the issue is that the t.trackInitial() will also 
> invoke the "addingService" customizer callback from the synchronized block. 
> Doing so will end up dereferencing the DS service reference while we are 
> holding the ServiceTracker lock.
> Interestingly, notice that the BundleTracker is not calling the 
> t.trackInitial() method from the synchronized block.
> so, I have attempted to make a patch, which seems to resolve the issue. All 
> tests are passing OK, and we don't experience the deadlock anymore, and I 
> would appreciate if another DM developer or user could take a look at my 
> proposed patch, to make sure I don't have messed up :-)
> thanks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to