Amichai Rothman created KARAF-2256:
--------------------------------------
Summary: Deadlock in pax web bundle when refreshing bundles
Key: KARAF-2256
URL: https://issues.apache.org/jira/browse/KARAF-2256
Project: Karaf
Issue Type: Bug
Affects Versions: 2.3.1
Environment: 64-bit Linux Oracle JDK 1.7.0_17
Reporter: Amichai Rothman
When attempting to install the DOSGi feature (by running "features:chooseurl
cxf-dosgi 1.4.0" and "features:install cxf-dosgi-discovery-distributed"), the
installation hangs along with some of the bundles which can no longer be
started, stopped, checked for imports, etc. - the Karaf server must be killed
and restarted to resume. This is likely not related to this specific feature,
and can happen with other refreshed bundles and installed features as well.
At a glance it seems like this is caused by the "OPS4J Pax Web - Runtime
(1.1.12)" bundle being stuck in the stopping state due to a deadlock caused by
its Activator:
It receives a removedService notification from a service tracker, which is
handled in a separate thread using a custom executor and eventually tries to
resolve some bundle and ends up waiting for acquireGlobalLock indefinitely.
This is because at the same time, Felix calls refreshPackages which attempts to
stop the bundle (while holding the lock), whose activator puts a cleanup task
in its custom executor and then attempts to shut down the executor. This never
happens, because the previous executor task initiated from removeService is
waiting for the lock, hence the deadlock.
I'm not entirely sure which of the projects has the underlying bug in it -
probably pax web, possibly Felix if the OSGi specs allow for the behavior that
hangs it, but in any case Karaf is using these versions and exhibiting the
deadlock, so at the very least should upgrade to fixed versions of these
libraries, or patch them.
If anyone who knows these systems better thinks it should be reported in one of
the upstream projects, point me in the right direction and I'll be happy to do
it.
Here is the thread dump, the top two threads show the deadlock, and the other
two are bundles which are stuck as well due to waiting for the same lock (I
think).
"FelixFrameworkWiring" daemon prio=10 tid=0x00007f390002e000 nid=0x35d1 in
Object.wait() [0x00007f3948dd3000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000f1b8ea50> (a java.lang.Object)
at java.lang.Object.wait(Object.java:503)
at
org.ops4j.pax.web.service.internal.Executor.shutdown(Executor.java:91)
- locked <0x00000000f1b8ea50> (a java.lang.Object)
at org.ops4j.pax.web.service.internal.Activator.stop(Activator.java:140)
at
org.apache.felix.framework.util.SecureAction.stopActivator(SecureAction.java:667)
at org.apache.felix.framework.Felix.stopBundle(Felix.java:2361)
at org.apache.felix.framework.Felix$RefreshHelper.stop(Felix.java:4629)
at org.apache.felix.framework.Felix.refreshPackages(Felix.java:3951)
at
org.apache.felix.framework.FrameworkWiringImpl.run(FrameworkWiringImpl.java:172)
at java.lang.Thread.run(Thread.java:722)
"Pax Web Runtime worker" daemon prio=10 tid=0x00007f3904263000 nid=0x370a in
Object.wait() [0x00007f390dfa8000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000e990e018> (a [Ljava.lang.Object;)
at java.lang.Object.wait(Object.java:503)
at org.apache.felix.framework.Felix.acquireGlobalLock(Felix.java:4944)
- locked <0x00000000e990e018> (a [Ljava.lang.Object;)
at
org.apache.felix.framework.StatefulResolver.resolve(StatefulResolver.java:219)
at
org.apache.felix.framework.BundleWiringImpl.searchDynamicImports(BundleWiringImpl.java:1539)
at
org.apache.felix.framework.BundleWiringImpl.findClassOrResourceByDelegation(BundleWiringImpl.java:1439)
at
org.apache.felix.framework.BundleWiringImpl.access$400(BundleWiringImpl.java:72)
at
org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass(BundleWiringImpl.java:1843)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at
org.apache.felix.framework.BundleWiringImpl.getClassByDelegation(BundleWiringImpl.java:1317)
at
org.apache.felix.framework.ServiceRegistrationImpl$ServiceReferenceImpl.isAssignableTo(ServiceRegistrationImpl.java:521)
at
org.apache.felix.framework.util.Util.isServiceAssignable(Util.java:280)
at
org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:916)
at
org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:793)
at
org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:543)
at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4260)
at org.apache.felix.framework.Felix.access$000(Felix.java:74)
at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:390)
at
org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:148)
at
org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:127)
at
org.ops4j.pax.web.service.internal.Activator.updateController(Activator.java:231)
at
org.ops4j.pax.web.service.internal.Activator$DynamicsServiceTrackerCustomizer$2.run(Activator.java:387)
at
org.ops4j.pax.web.service.internal.Executor$Future.run(Executor.java:45)
at
org.ops4j.pax.web.service.internal.Executor$Worker.run(Executor.java:122)
"fileinstall-/opt/apache-karaf-2.3.1/deploy" daemon prio=10
tid=0x00007f3904018800 nid=0x35a8 in Object.wait() [0x00007f394aba8000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000e990e018> (a [Ljava.lang.Object;)
at java.lang.Object.wait(Object.java:503)
at org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:4871)
- locked <0x00000000e990e018> (a [Ljava.lang.Object;)
at org.apache.felix.framework.Felix.startBundle(Felix.java:1744)
at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:944)
at
org.apache.felix.fileinstall.internal.DirectoryWatcher.startBundle(DirectoryWatcher.java:1247)
at
org.apache.felix.fileinstall.internal.DirectoryWatcher.startBundles(DirectoryWatcher.java:1219)
at
org.apache.felix.fileinstall.internal.DirectoryWatcher.startAllBundles(DirectoryWatcher.java:1208)
at
org.apache.felix.fileinstall.internal.DirectoryWatcher.process(DirectoryWatcher.java:503)
at
org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:291)
"NioProcessor-2" prio=10 tid=0x00007f3914014000 nid=0x35fd in Object.wait()
[0x00007f394a064000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000e990e018> (a [Ljava.lang.Object;)
at java.lang.Object.wait(Object.java:503)
at org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:4871)
- locked <0x00000000e990e018> (a [Ljava.lang.Object;)
at org.apache.felix.framework.Felix.startBundle(Felix.java:1744)
at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:944)
at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:931)
at
org.apache.karaf.features.internal.FeaturesServiceImpl.installFeatures(FeaturesServiceImpl.java:479)
at
org.apache.karaf.features.internal.FeaturesServiceImpl.installFeature(FeaturesServiceImpl.java:396)
at
org.apache.karaf.features.internal.FeaturesServiceImpl.installFeature(FeaturesServiceImpl.java:392)
at
org.apache.karaf.features.command.InstallFeatureCommand.doExecute(InstallFeatureCommand.java:62)
at
org.apache.karaf.features.command.FeaturesCommandSupport.doExecute(FeaturesCommandSupport.java:41)
at
org.apache.karaf.shell.console.OsgiCommandSupport.execute(OsgiCommandSupport.java:38)
at
org.apache.felix.gogo.commands.basic.AbstractCommand.execute(AbstractCommand.java:35)
at
org.apache.felix.gogo.runtime.CommandProxy.execute(CommandProxy.java:78)
at org.apache.felix.gogo.runtime.Closure.executeCmd(Closure.java:474)
at
org.apache.felix.gogo.runtime.Closure.executeStatement(Closure.java:400)
at org.apache.felix.gogo.runtime.Pipe.run(Pipe.java:108)
at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:183)
at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:120)
at
org.apache.felix.gogo.runtime.CommandSessionImpl.execute(CommandSessionImpl.java:89)
at
org.apache.karaf.shell.ssh.ShellCommandFactory$ShellCommand$1.run(ShellCommandFactory.java:109)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.karaf.shell.ssh.ShellCommandFactory$ShellCommand.start(ShellCommandFactory.java:107)
at
org.apache.sshd.server.channel.ChannelSession.handleExec(ChannelSession.java:388)
at
org.apache.sshd.server.channel.ChannelSession.handleRequest(ChannelSession.java:235)
at
org.apache.sshd.server.channel.ChannelSession.handleRequest(ChannelSession.java:195)
at
org.apache.sshd.common.session.AbstractSession.channelRequest(AbstractSession.java:1057)
at
org.apache.sshd.server.session.ServerSession.running(ServerSession.java:229)
at
org.apache.sshd.server.session.ServerSession.handleMessage(ServerSession.java:205)
at
org.apache.sshd.common.session.AbstractSession.decode(AbstractSession.java:566)
at
org.apache.sshd.common.session.AbstractSession.messageReceived(AbstractSession.java:236)
- locked <0x00000000efd56b00> (a java.lang.Object)
at
org.apache.sshd.common.AbstractSessionIoHandler.messageReceived(AbstractSessionIoHandler.java:58)
at
org.apache.mina.core.filterchain.DefaultIoFilterChain$TailFilter.messageReceived(DefaultIoFilterChain.java:690)
at
org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
at
org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47)
at
org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765)
at
org.apache.mina.core.filterchain.IoFilterAdapter.messageReceived(IoFilterAdapter.java:109)
at
org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
at
org.apache.mina.core.filterchain.DefaultIoFilterChain.fireMessageReceived(DefaultIoFilterChain.java:410)
at
org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:710)
at
org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:664)
at
org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:653)
at
org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67)
at
org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1124)
at
org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira