[
https://issues.apache.org/jira/browse/KARAF-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616478#comment-13616478
]
Amichai Rothman commented on KARAF-2256:
----------------------------------------
Today I came across another deadlock in a similar scenario, but this time
involving Apache CXF DOSGi rather than Pax Web. This happened while copying an
updated bundle to the Karaf deploy folder. Here too there is a call to
refreshBundles, which attempts to stop a bundle (while holding the global
bundle lock), which down the stack gets blocked waiting for another lock, and
that lock is being held by another thread which then tries to register a
service, which requires the global bundle lock... deadlocked.
This again might be a CXF bug, or a Felix bug, but in any case it causes Karaf
to hang indefinitely (as far as bundle operations are concerned, including
shutting down Karaf). Seeing this in two separate cases now, I'm inclined to
guess that either this is a Felix bug (that shouldn't be holding the locks
while refreshing packages or invoking callbacks, at least not the way it
currently does, or maybe needs to serialize the callbacks better), or maybe the
spec/api is not clear enough about what is allowed or not to do from within its
bundle and service callbacks, which these two (and probably other) bundles are
getting wrong - must be a nasty pitfall in there.
It would be great to hear what the OSGi experts think...
"FelixFrameworkWiring" daemon prio=10 tid=0x00007fe4a8003800 nid=0x65b0 waiting
for monitor entry [0x00007fe4817a4000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.cxf.dosgi.dsw.service.RemoteServiceAdminCore.removeImportRegistration(RemoteServiceAdminCore.java:399)
- waiting to lock <0x00000000eab912b8> (a java.util.LinkedHashMap)
at
org.apache.cxf.dosgi.dsw.service.ImportRegistrationImpl.close(ImportRegistrationImpl.java:99)
- locked <0x00000000eba3b648> (a
org.apache.cxf.dosgi.dsw.service.ImportRegistrationImpl)
at
org.apache.cxf.dosgi.topologymanager.importer.TopologyManagerImport.removeServiceInterest(TopologyManagerImport.java:137)
at
org.apache.cxf.dosgi.topologymanager.importer.ListenerHookImpl.removed(ListenerHookImpl.java:111)
at
org.apache.felix.framework.util.SecureAction.invokeServiceListenerHookRemoved(SecureAction.java:1226)
at
org.apache.felix.framework.Felix.removeServiceListener(Felix.java:3151)
at
org.apache.felix.framework.BundleContextImpl.removeServiceListener(BundleContextImpl.java:289)
at
org.apache.aries.blueprint.container.AbstractServiceReferenceRecipe.stop(AbstractServiceReferenceRecipe.java:155)
- locked <0x00000000ea9a68b0> (a java.lang.Object)
at
org.apache.aries.blueprint.container.BlueprintContainerImpl.untrackServiceReference(BlueprintContainerImpl.java:613)
at
org.apache.aries.blueprint.container.BlueprintContainerImpl.untrackServiceReferences(BlueprintContainerImpl.java:593)
at
org.apache.aries.blueprint.container.BlueprintContainerImpl.tidyupComponents(BlueprintContainerImpl.java:906)
at
org.apache.aries.blueprint.container.BlueprintContainerImpl.destroy(BlueprintContainerImpl.java:857)
at
org.apache.aries.blueprint.container.BlueprintExtender$3.run(BlueprintExtender.java:284)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
org.apache.aries.blueprint.container.BlueprintExtender.destroyContainer(BlueprintExtender.java:305)
at
org.apache.aries.blueprint.container.BlueprintExtender.modifiedBundle(BlueprintExtender.java:206)
at
org.apache.aries.util.tracker.hook.BundleHookBundleTracker$Tracked.customizerModified(BundleHookBundleTracker.java:500)
at
org.apache.aries.util.tracker.hook.BundleHookBundleTracker$Tracked.customizerModified(BundleHookBundleTracker.java:433)
at
org.apache.aries.util.tracker.hook.BundleHookBundleTracker$AbstractTracked.track(BundleHookBundleTracker.java:725)
at
org.apache.aries.util.tracker.hook.BundleHookBundleTracker$Tracked.bundleChanged(BundleHookBundleTracker.java:463)
at
org.apache.aries.util.tracker.hook.BundleHookBundleTracker$BundleEventHook.event(BundleHookBundleTracker.java:422)
at
org.apache.felix.framework.util.SecureAction.invokeBundleEventHook(SecureAction.java:1103)
at
org.apache.felix.framework.util.EventDispatcher.createWhitelistFromHooks(EventDispatcher.java:695)
at
org.apache.felix.framework.util.EventDispatcher.fireBundleEvent(EventDispatcher.java:483)
at org.apache.felix.framework.Felix.fireBundleEvent(Felix.java:4244)
at org.apache.felix.framework.Felix.stopBundle(Felix.java:2351)
at org.apache.felix.framework.Felix$RefreshHelper.stop(Felix.java:4629)
at org.apache.felix.framework.Felix.refreshPackages(Felix.java:3951)
at
org.apache.felix.framework.FrameworkWiringImpl.run(FrameworkWiringImpl.java:172)
at java.lang.Thread.run(Thread.java:722)
"pool-18-thread-3" prio=10 tid=0x00007fe4a4044800 nid=0x6c74 in Object.wait()
[0x00007fe47a781000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:4871)
- locked <0x00000000e9924fc0> (a [Ljava.lang.Object;)
at org.apache.felix.framework.Felix.registerService(Felix.java:3205)
at
org.apache.felix.framework.BundleContextImpl.registerService(BundleContextImpl.java:346)
at
org.apache.felix.framework.BundleContextImpl.registerService(BundleContextImpl.java:320)
at
org.apache.cxf.dosgi.dsw.service.RemoteServiceAdminCore.proxifyMatchingInterface(RemoteServiceAdminCore.java:336)
at
org.apache.cxf.dosgi.dsw.service.RemoteServiceAdminCore.importService(RemoteServiceAdminCore.java:291)
- locked <0x00000000eab912b8> (a java.util.LinkedHashMap)
at
org.apache.cxf.dosgi.dsw.service.RemoteServiceAdminInstance$2.run(RemoteServiceAdminInstance.java:117)
at
org.apache.cxf.dosgi.dsw.service.RemoteServiceAdminInstance$2.run(RemoteServiceAdminInstance.java:110)
at java.security.AccessController.doPrivileged(Native Method)
at
org.apache.cxf.dosgi.dsw.service.RemoteServiceAdminInstance.importService(RemoteServiceAdminInstance.java:110)
at
org.apache.cxf.dosgi.topologymanager.importer.TopologyManagerImport.importService(TopologyManagerImport.java:282)
at
org.apache.cxf.dosgi.topologymanager.importer.TopologyManagerImport.importServices(TopologyManagerImport.java:253)
at
org.apache.cxf.dosgi.topologymanager.importer.TopologyManagerImport.access$300(TopologyManagerImport.java:53)
at
org.apache.cxf.dosgi.topologymanager.importer.TopologyManagerImport$2.run(TopologyManagerImport.java:202)
- locked <0x00000000e9cf1ef8> (a java.util.HashMap)
- locked <0x00000000e9cf23c8> (a java.util.HashMap)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
> Deadlock in pax web bundle when refreshing bundles
> --------------------------------------------------
>
> Key: KARAF-2256
> URL: https://issues.apache.org/jira/browse/KARAF-2256
> Project: Karaf
> Issue Type: Bug
> Affects Versions: 2.3.1
> Environment: 64-bit Linux Oracle JDK 1.7.0_17
> Reporter: Amichai Rothman
> Assignee: Achim Nierbeck
>
> When attempting to install the DOSGi feature (by running "features:chooseurl
> cxf-dosgi 1.4.0" and "features:install cxf-dosgi-discovery-distributed"), the
> installation hangs along with some of the bundles which can no longer be
> started, stopped, checked for imports, etc. - the Karaf server must be killed
> and restarted to resume. This is likely not related to this specific feature,
> and can happen with other refreshed bundles and installed features as well.
> At a glance it seems like this is caused by the "OPS4J Pax Web - Runtime
> (1.1.12)" bundle being stuck in the stopping state due to a deadlock caused
> by its Activator:
> It receives a removedService notification from a service tracker, which is
> handled in a separate thread using a custom executor and eventually tries to
> resolve some bundle and ends up waiting for acquireGlobalLock indefinitely.
> This is because at the same time, Felix calls refreshPackages which attempts
> to stop the bundle (while holding the lock), whose activator puts a cleanup
> task in its custom executor and then attempts to shut down the executor. This
> never happens, because the previous executor task initiated from
> removeService is waiting for the lock, hence the deadlock.
> I'm not entirely sure which of the projects has the underlying bug in it -
> probably pax web, possibly Felix if the OSGi specs allow for the behavior
> that hangs it, but in any case Karaf is using these versions and exhibiting
> the deadlock, so at the very least should upgrade to fixed versions of these
> libraries, or patch them.
> If anyone who knows these systems better thinks it should be reported in one
> of the upstream projects, point me in the right direction and I'll be happy
> to do it.
> Here is the thread dump, the top two threads show the deadlock, and the other
> two are bundles which are stuck as well due to waiting for the same lock (I
> think).
> "FelixFrameworkWiring" daemon prio=10 tid=0x00007f390002e000 nid=0x35d1 in
> Object.wait() [0x00007f3948dd3000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000000f1b8ea50> (a java.lang.Object)
> at java.lang.Object.wait(Object.java:503)
> at
> org.ops4j.pax.web.service.internal.Executor.shutdown(Executor.java:91)
> - locked <0x00000000f1b8ea50> (a java.lang.Object)
> at
> org.ops4j.pax.web.service.internal.Activator.stop(Activator.java:140)
> at
> org.apache.felix.framework.util.SecureAction.stopActivator(SecureAction.java:667)
> at org.apache.felix.framework.Felix.stopBundle(Felix.java:2361)
> at
> org.apache.felix.framework.Felix$RefreshHelper.stop(Felix.java:4629)
> at org.apache.felix.framework.Felix.refreshPackages(Felix.java:3951)
> at
> org.apache.felix.framework.FrameworkWiringImpl.run(FrameworkWiringImpl.java:172)
> at java.lang.Thread.run(Thread.java:722)
> "Pax Web Runtime worker" daemon prio=10 tid=0x00007f3904263000 nid=0x370a in
> Object.wait() [0x00007f390dfa8000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000000e990e018> (a [Ljava.lang.Object;)
> at java.lang.Object.wait(Object.java:503)
> at org.apache.felix.framework.Felix.acquireGlobalLock(Felix.java:4944)
> - locked <0x00000000e990e018> (a [Ljava.lang.Object;)
> at
> org.apache.felix.framework.StatefulResolver.resolve(StatefulResolver.java:219)
> at
> org.apache.felix.framework.BundleWiringImpl.searchDynamicImports(BundleWiringImpl.java:1539)
> at
> org.apache.felix.framework.BundleWiringImpl.findClassOrResourceByDelegation(BundleWiringImpl.java:1439)
> at
> org.apache.felix.framework.BundleWiringImpl.access$400(BundleWiringImpl.java:72)
> at
> org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass(BundleWiringImpl.java:1843)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> at
> org.apache.felix.framework.BundleWiringImpl.getClassByDelegation(BundleWiringImpl.java:1317)
> at
> org.apache.felix.framework.ServiceRegistrationImpl$ServiceReferenceImpl.isAssignableTo(ServiceRegistrationImpl.java:521)
> at
> org.apache.felix.framework.util.Util.isServiceAssignable(Util.java:280)
> at
> org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:916)
> at
> org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:793)
> at
> org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:543)
> at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4260)
> at org.apache.felix.framework.Felix.access$000(Felix.java:74)
> at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:390)
> at
> org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:148)
> at
> org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:127)
> at
> org.ops4j.pax.web.service.internal.Activator.updateController(Activator.java:231)
> at
> org.ops4j.pax.web.service.internal.Activator$DynamicsServiceTrackerCustomizer$2.run(Activator.java:387)
> at
> org.ops4j.pax.web.service.internal.Executor$Future.run(Executor.java:45)
> at
> org.ops4j.pax.web.service.internal.Executor$Worker.run(Executor.java:122)
> "fileinstall-/opt/apache-karaf-2.3.1/deploy" daemon prio=10
> tid=0x00007f3904018800 nid=0x35a8 in Object.wait() [0x00007f394aba8000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000000e990e018> (a [Ljava.lang.Object;)
> at java.lang.Object.wait(Object.java:503)
> at org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:4871)
> - locked <0x00000000e990e018> (a [Ljava.lang.Object;)
> at org.apache.felix.framework.Felix.startBundle(Felix.java:1744)
> at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:944)
> at
> org.apache.felix.fileinstall.internal.DirectoryWatcher.startBundle(DirectoryWatcher.java:1247)
> at
> org.apache.felix.fileinstall.internal.DirectoryWatcher.startBundles(DirectoryWatcher.java:1219)
> at
> org.apache.felix.fileinstall.internal.DirectoryWatcher.startAllBundles(DirectoryWatcher.java:1208)
> at
> org.apache.felix.fileinstall.internal.DirectoryWatcher.process(DirectoryWatcher.java:503)
> at
> org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:291)
> "NioProcessor-2" prio=10 tid=0x00007f3914014000 nid=0x35fd in Object.wait()
> [0x00007f394a064000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000000e990e018> (a [Ljava.lang.Object;)
> at java.lang.Object.wait(Object.java:503)
> at org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:4871)
> - locked <0x00000000e990e018> (a [Ljava.lang.Object;)
> at org.apache.felix.framework.Felix.startBundle(Felix.java:1744)
> at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:944)
> at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:931)
> at
> org.apache.karaf.features.internal.FeaturesServiceImpl.installFeatures(FeaturesServiceImpl.java:479)
> at
> org.apache.karaf.features.internal.FeaturesServiceImpl.installFeature(FeaturesServiceImpl.java:396)
> at
> org.apache.karaf.features.internal.FeaturesServiceImpl.installFeature(FeaturesServiceImpl.java:392)
> at
> org.apache.karaf.features.command.InstallFeatureCommand.doExecute(InstallFeatureCommand.java:62)
> at
> org.apache.karaf.features.command.FeaturesCommandSupport.doExecute(FeaturesCommandSupport.java:41)
> at
> org.apache.karaf.shell.console.OsgiCommandSupport.execute(OsgiCommandSupport.java:38)
> at
> org.apache.felix.gogo.commands.basic.AbstractCommand.execute(AbstractCommand.java:35)
> at
> org.apache.felix.gogo.runtime.CommandProxy.execute(CommandProxy.java:78)
> at org.apache.felix.gogo.runtime.Closure.executeCmd(Closure.java:474)
> at
> org.apache.felix.gogo.runtime.Closure.executeStatement(Closure.java:400)
> at org.apache.felix.gogo.runtime.Pipe.run(Pipe.java:108)
> at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:183)
> at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:120)
> at
> org.apache.felix.gogo.runtime.CommandSessionImpl.execute(CommandSessionImpl.java:89)
> at
> org.apache.karaf.shell.ssh.ShellCommandFactory$ShellCommand$1.run(ShellCommandFactory.java:109)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.karaf.shell.ssh.ShellCommandFactory$ShellCommand.start(ShellCommandFactory.java:107)
> at
> org.apache.sshd.server.channel.ChannelSession.handleExec(ChannelSession.java:388)
> at
> org.apache.sshd.server.channel.ChannelSession.handleRequest(ChannelSession.java:235)
> at
> org.apache.sshd.server.channel.ChannelSession.handleRequest(ChannelSession.java:195)
> at
> org.apache.sshd.common.session.AbstractSession.channelRequest(AbstractSession.java:1057)
> at
> org.apache.sshd.server.session.ServerSession.running(ServerSession.java:229)
> at
> org.apache.sshd.server.session.ServerSession.handleMessage(ServerSession.java:205)
> at
> org.apache.sshd.common.session.AbstractSession.decode(AbstractSession.java:566)
> at
> org.apache.sshd.common.session.AbstractSession.messageReceived(AbstractSession.java:236)
> - locked <0x00000000efd56b00> (a java.lang.Object)
> at
> org.apache.sshd.common.AbstractSessionIoHandler.messageReceived(AbstractSessionIoHandler.java:58)
> at
> org.apache.mina.core.filterchain.DefaultIoFilterChain$TailFilter.messageReceived(DefaultIoFilterChain.java:690)
> at
> org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
> at
> org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47)
> at
> org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765)
> at
> org.apache.mina.core.filterchain.IoFilterAdapter.messageReceived(IoFilterAdapter.java:109)
> at
> org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
> at
> org.apache.mina.core.filterchain.DefaultIoFilterChain.fireMessageReceived(DefaultIoFilterChain.java:410)
> at
> org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:710)
> at
> org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:664)
> at
> org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:653)
> at
> org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67)
> at
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1124)
> at
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira