[
https://issues.apache.org/jira/browse/FELIX-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pierre De Rop resolved FELIX-5716.
----------------------------------
Resolution: Fixed
Committed the patch in revision 1828022.
> Dead Lock in DM
> ---------------
>
> Key: FELIX-5716
> URL: https://issues.apache.org/jira/browse/FELIX-5716
> Project: Felix
> Issue Type: Bug
> Affects Versions: org.apache.felix.dependencymanager-r1
> Reporter: Pierre De Rop
> Assignee: Pierre De Rop
> Priority: Critical
> Attachments: deadlock.proposed.patch.txt
>
>
> I just found an unfortunate deadlock when using latest DM r11 and latest SCR
> 2.0.12 in the same JVM.
> I believe that the same issue applies to both DM and SCR (I may have to open
> a seperate issue for SCR).
> First, here is the deadlock:
> {code}
> Found one Java-level deadlock:
> =============================
> "CM Event Dispatcher (Fire ConfigurationEvent:
> pid=com.alcatel.as.http.ioh.impl.HttpIOH)":
> waiting to lock monitor 0x00007f8188004538 (object 0x00000000eb38d420, a
> org.apache.felix.dm.tracker.ServiceTracker$Tracked),
> which is held by "FelixDispatchQueue"
> "FelixDispatchQueue":
> waiting for ownable synchronizer 0x00000000c0699f30, (a
> java.util.concurrent.locks.ReentrantLock$FairSync),
> which is held by "CM Event Dispatcher (Fire ConfigurationEvent:
> pid=com.alcatel.as.http.ioh.impl.HttpIOH)"
> Java stack information for the threads listed above:
> ===================================================
> "CM Event Dispatcher (Fire ConfigurationEvent:
> pid=com.alcatel.as.http.ioh.impl.HttpIOH)":
> at
> org.apache.felix.dm.tracker.ServiceTracker$Tracked.serviceChangedHideAspects(ServiceTracker.java:1140)
> - waiting to lock <0x00000000eb38d420> (a
> org.apache.felix.dm.tracker.ServiceTracker$Tracked)
> at
> org.apache.felix.dm.tracker.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1054)
> at
> org.apache.felix.framework.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:990)
> at
> org.apache.felix.framework.EventDispatcher.fireEventImmediately(EventDispatcher.java:838)
> at
> org.apache.felix.framework.EventDispatcher.fireServiceEvent(EventDispatcher.java:545)
> at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4579)
> at org.apache.felix.framework.Felix.access$000(Felix.java:105)
> at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:419)
> at
> org.apache.felix.framework.ServiceRegistry.servicePropertiesModified(ServiceRegistry.java:588)
> at
> org.apache.felix.framework.ServiceRegistrationImpl.setProperties(ServiceRegistrationImpl.java:131)
> at
> org.apache.felix.scr.impl.manager.SingleComponentManager.updateServiceRegistration(SingleComponentManager.java:558)
> at
> org.apache.felix.scr.impl.manager.SingleComponentManager.modify(SingleComponentManager.java:755)
> at
> org.apache.felix.scr.impl.manager.SingleComponentManager.reconfigure(SingleComponentManager.java:645)
> at
> org.apache.felix.scr.impl.manager.SingleComponentManager.reconfigure(SingleComponentManager.java:609)
> at
> org.apache.felix.scr.impl.manager.ConfigurableComponentHolder.configurationUpdated(ConfigurableComponentHolder.java:426)
> at
> org.apache.felix.scr.impl.manager.RegionConfigurationSupport.configurationEvent(RegionConfigurationSupport.java:284)
> at
> org.apache.felix.scr.impl.manager.RegionConfigurationSupport$1.configurationEvent(RegionConfigurationSupport.java:89)
> at
> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2090)
> at
> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2058)
> at org.apache.felix.cm.impl.UpdateThread.run0(UpdateThread.java:141)
> at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:109)
> at java.lang.Thread.run(Thread.java:748)
> "FelixDispatchQueue":
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000000c0699f30> (a
> java.util.concurrent.locks.ReentrantLock$FairSync)
> at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)
> at
> java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:442)
> at
> org.apache.felix.scr.impl.manager.AbstractComponentManager.obtainLock(AbstractComponentManager.java:228)
> at
> org.apache.felix.scr.impl.manager.AbstractComponentManager.obtainStateLock(AbstractComponentManager.java:279)
> at
> org.apache.felix.scr.impl.manager.SingleComponentManager.getService(SingleComponentManager.java:806)
> at
> org.apache.felix.framework.ServiceRegistrationImpl.getFactoryUnchecked(ServiceRegistrationImpl.java:347)
> at
> org.apache.felix.framework.ServiceRegistrationImpl.getService(ServiceRegistrationImpl.java:247)
> at
> org.apache.felix.framework.ServiceRegistry.getService(ServiceRegistry.java:350)
> at org.apache.felix.framework.Felix.getService(Felix.java:3721)
> at
> org.apache.felix.framework.BundleContextImpl.getService(BundleContextImpl.java:470)
> at
> org.apache.felix.dm.impl.ServiceEventImpl.getEvent(ServiceEventImpl.java:86)
> at
> org.apache.felix.dm.impl.ServiceDependencyImpl.addingService(ServiceDependencyImpl.java:251)
> at
> org.apache.felix.dm.tracker.ServiceTracker$Tracked.customizerAdding(ServiceTracker.java:1276)
> at
> org.apache.felix.dm.tracker.AbstractTracked.trackAdding(AbstractTracked.java:279)
> at
> org.apache.felix.dm.tracker.AbstractTracked.trackInitial(AbstractTracked.java:188)
> - locked <0x00000000eb38d420> (a
> org.apache.felix.dm.tracker.ServiceTracker$Tracked)
> at
> org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:387)
> - locked <0x00000000eb38d420> (a
> org.apache.felix.dm.tracker.ServiceTracker$Tracked)
> - locked <0x00000000eb38cd00> (a
> org.apache.felix.dm.tracker.ServiceTracker)
> at
> org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:320)
> at
> org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:296)
> at
> org.apache.felix.dm.impl.ServiceDependencyImpl.start(ServiceDependencyImpl.java:235)
> at
> org.apache.felix.dm.impl.ComponentImpl.startDependencies(ComponentImpl.java:1360)
> at
> org.apache.felix.dm.impl.ComponentImpl.performTransition(ComponentImpl.java:1059)
> at
> org.apache.felix.dm.impl.ComponentImpl.handleChange(ComponentImpl.java:1010)
> at
> org.apache.felix.dm.impl.ComponentImpl.lambda$start$2(ComponentImpl.java:426)
> at
> org.apache.felix.dm.impl.ComponentImpl$$Lambda$7/201251057.run(Unknown Source)
> at
> org.apache.felix.dm.impl.SerialExecutor.runTask(SerialExecutor.java:138)
> at
> org.apache.felix.dm.impl.SerialExecutor.runTasks(SerialExecutor.java:120)
> at
> org.apache.felix.dm.impl.SerialExecutor.execute(SerialExecutor.java:86)
> at
> org.apache.felix.dm.impl.SerialExecutor.execute(SerialExecutor.java:105)
> at
> org.apache.felix.dm.impl.ComponentImpl.start(ComponentImpl.java:424)
> at
> org.apache.felix.dm.impl.ComponentScheduler.add(ComponentScheduler.java:69)
> at
> org.apache.felix.dm.DependencyManager.add(DependencyManager.java:136)
> at
> com.alcatel.as.service.coordinator.impl.Activator.init(Activator.java:16)
> at
> org.apache.felix.dm.DependencyActivatorBase.start(DependencyActivatorBase.java:75)
> at
> org.apache.felix.framework.util.SecureAction.startActivator(SecureAction.java:697)
> at org.apache.felix.framework.Felix.activateBundle(Felix.java:2239)
> at org.apache.felix.framework.Felix.startBundle(Felix.java:2145)
> at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:998)
> at
> com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl$DeployedBundle.start(BundleInstallerImpl.java:1056)
> at
> com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.startBundles(BundleInstallerImpl.java:728)
> at
> com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.finishInitialisation(BundleInstallerImpl.java:230)
> at
> com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.frameworkEvent(BundleInstallerImpl.java:213)
> at
> com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.start(BundleInstallerImpl.java:174)
> at
> org.apache.felix.framework.util.SecureAction.startActivator(SecureAction.java:697)
> at org.apache.felix.framework.Felix.activateBundle(Felix.java:2239)
> at org.apache.felix.framework.Felix.startBundle(Felix.java:2145)
> at org.apache.felix.framework.Felix.updateBundle(Felix.java:2506)
> at org.apache.felix.framework.BundleImpl.update(BundleImpl.java:1018)
> at org.apache.felix.framework.BundleImpl.update(BundleImpl.java:1004)
> at
> com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.checkSelfUpdate(BundleInstallerImpl.java:273)
> at
> com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.frameworkEvent(BundleInstallerImpl.java:192)
> at
> org.apache.felix.framework.EventDispatcher.invokeFrameworkListenerCallback(EventDispatcher.java:881)
> at
> org.apache.felix.framework.EventDispatcher.fireEventImmediately(EventDispatcher.java:830)
> at
> org.apache.felix.framework.EventDispatcher.run(EventDispatcher.java:1147)
> at
> org.apache.felix.framework.EventDispatcher.access$000(EventDispatcher.java:54)
> at
> org.apache.felix.framework.EventDispatcher$1.run(EventDispatcher.java:102)
> at java.lang.Thread.run(Thread.java:748)
> Found 1 deadlock.
> {code}
> now, here is my first interpretation of what is going on:
> in SCR:
> =====
> Configuration Admin is delivering a configuration update event to SCR. So,
> the
> org.apache.felix.scr.impl.manager.RegionConfigurationSupport.configurationEvent()
> method is then called. This method then indirectly triggers a call to the
> SingleComponentManager.reconfigure method, which then calls
> SingleComponentManager.modify method. Here is the modify() method:
> {code}
> private boolean modify(boolean configurationDeleted)
> {
> //0 SCR 112.7.1 If configuration is deleted, and version is < 1.3 and
> no flag set, then deactivate unconditionally.
> // For version 1.3 and later, or with a flag, more sensible behavior
> is allowed.
> if ( configurationDeleted &&
> !getComponentMetadata().isDeleteCallsModify()){
> return false;
> }
> // 1. no live update if there is no declared method
> if ( getComponentMetadata().getModified() == null )
> {
> log( LogService.LOG_DEBUG, "No modified method, cannot update
> dynamically", null );
> return false;
> }
> // invariant: we have a modified method name
> // 2. get and check configured method
> // invariant: modify method is configured and found
> // 3. check whether we can dynamically apply the configuration if
> // any target filters influence the bound services
> final Map<String, Object> props = getProperties();
> for ( DependencyManager<S, ?> dm: getDependencyManagers() )
> {
> if ( !dm.canUpdateDynamically( props ) )
> {
> log( LogService.LOG_DEBUG,
> "Cannot dynamically update the configuration due to
> dependency changes induced on dependency {0}",
> new Object[] {dm.getName()}, null );
> return false;
> }
> }
> // invariant: modify method existing and no static bound service
> changes
> // 4. call method (nothing to do when failed, since it has already
> been logged)
> // (call with non-null default result to continue even if the
> // modify method call failed)
> obtainStateLock( );
> try
> {
> //cf 112.5.12 where invoking modified method before updating
> target services is specified.
> final MethodResult result = invokeModifiedMethod();
> updateTargets( props );
> if ( result == null )
> {
> // log an error if the declared method cannot be found
> log( LogService.LOG_ERROR, "Declared modify method ''{0}''
> cannot be found, configuring by reactivation",
> new Object[] {getComponentMetadata().getModified()},
> null );
> return false;
> }
> // 5. update the target filter on the services now, this may still
> // result in unsatisfied dependencies, in which case we abort
> // this dynamic update and have the component be deactivated
> if ( !verifyDependencyManagers() )
> {
> log( LogService.LOG_DEBUG,
> "Updating the service references caused at least one
> reference to become unsatisfied, deactivating component",
> null );
> return false;
> }
> // 6. update service registration properties if we didn't just do
> it
> if ( result.hasResult() )
> {
> setServiceProperties( result, null );
> }
> else
> {
> updateServiceRegistration();
> }
> ...
> {code}
> This is problematic because we see that the modify method is invoking the
> updateServiceRegistration while holding the StateLock. (the obtainStateLock()
> method is called and then the service registration is updaded).
> Here, the listener is a DM component, but this component is internally
> synchronized on it's service tracker, and is currently blocked, trying to
> dereference the DS service, but it can't because the DS service state lock is
> currently held by the modify method.
> in the next section, I'm now analyzing what is doing DM and will propose a
> patch in order to make sure DM does not hold any locks while calling
> bundleContext.getService(ref) ... this will fix the issue.
> Dependency Manager
> =================
> In dependency manager, now let's analyze the the FelixDispatchQueue
> stacktrace: we open the DM service tracker, but the open method (in the DM
> service tracker) calls the AbstractTracked.trackInitial while holding the
> lock on ServiceTracker$Tracked. Then this triggers the customizer
> "addingService" method call, which then perform a getService. But the
> getService then tries to dereference a declarative service component, and the
> stacktrace is then blocked because the DS component state lock is currently
> locked by the other "CM Event Dispatcher"
> So, in the end, I think that there is something wrong in both SCR and DM :
> 1) SCR obtain the state lock while update the service registration.
> This is , I think, unfortunate because updating the service registration
> while holding a lock is like hodling a lock while calling foreign code (you
> don't know what will do the service listener when you update the service
> registration).
> 2) and DM calls the customizer "addingService" method while holding the lock
> on the service tracker. This is also not satisfying.
> So, at minimul, in DM I do believe that we should manage to not invoke the
> customizer.addingService while holding the lock on the DM service tracker.
> now, I have given a look into the DM ServiceTracker.open method and it is
> currently implemented like this:
> {code}
> public void open(boolean trackAllServices, boolean trackAllAspects) {
> if (debug) {
> System.out.println("[ServiceTracker] " + debugKey + "
> T" + Thread.currentThread().getId() + " open");
> }
> final Tracked t;
> synchronized (this) {
> if (tracked != null) {
> return;
> }
> if (DEBUG) {
> System.out.println("ServiceTracker.open: " +
> filter);
> }
> m_trackAllAspects = trackAllAspects;
> t = trackAllServices ? new AllTracked() : new Tracked();
> synchronized (t) {
> try {
> context.addServiceListener(t,
> listenerFilter);
> ServiceReference[] references = null;
> if (trackClass != null) {
> references =
> getInitialReferences(trackAllServices,
> trackClass,
> null);
> }
> else {
> if (trackReference != null) {
> if
> (trackReference.getBundle() != null) {
> references =
> new ServiceReference[] {trackReference};
> }
> }
> else { /* user supplied filter
> */
> references =
> getInitialReferences(trackAllServices,
> null,
>
> (listenerFilter != null) ? listenerFilter
>
> : filter.toString());
> }
> }
> /* set tracked with the initial
> references */
> t.setInitial(references);
>
> // only actually schedules the actions
> for execution within this synchronized block,
> // but do the actual execution
> afterwards.
> t.trackInitial();
> }
> catch (InvalidSyntaxException e) {
> throw new RuntimeException(
> "unexpected
> InvalidSyntaxException: "
> +
> e.getMessage(), e);
> }
> }
> tracked = t;
> }
> /* Call tracked outside of synchronized region */
> // just trigger the executor
> t.getExecutor().execute();
> }
> {code}
> As you can see the "t.trackInitial()" is invoked from within the synchronized
> block. This method actually schedules customizer "added" callbacks, which
> will be invoked properly outside of the synchronized block (when
> t.getExecutor().executor() will be called).
> But please notice that the the issue is that the t.trackInitial() will also
> invoke the "addingService" customizer callback from the synchronized block.
> Doing so will end up dereferencing the DS service reference while we are
> holding the ServiceTracker lock.
> Interestingly, notice that the BundleTracker is not calling the
> t.trackInitial() method from the synchronized block.
> so, I have attempted to make a patch, which seems to resolve the issue. All
> tests are passing OK, and we don't experience the deadlock anymore, and I
> would appreciate if another DM developer or user could take a look at my
> proposed patch, to make sure I don't have messed up :-)
> thanks
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)