Amichai Rothman created KARAF-2256:
--------------------------------------

             Summary: Deadlock in pax web bundle when refreshing bundles
                 Key: KARAF-2256
                 URL: https://issues.apache.org/jira/browse/KARAF-2256
             Project: Karaf
          Issue Type: Bug
    Affects Versions: 2.3.1
         Environment: 64-bit Linux Oracle JDK 1.7.0_17
            Reporter: Amichai Rothman


When attempting to install the DOSGi feature (by running "features:chooseurl 
cxf-dosgi 1.4.0" and "features:install cxf-dosgi-discovery-distributed"), the 
installation hangs along with some of the bundles which can no longer be 
started, stopped, checked for imports, etc. - the Karaf server must be killed 
and restarted to resume. This is likely not related to this specific feature, 
and can happen with other refreshed bundles and installed features as well.

At a glance it seems like this is caused by the "OPS4J Pax Web - Runtime 
(1.1.12)" bundle being stuck in the stopping state due to a deadlock caused by 
its Activator:

It receives a removedService notification from a service tracker, which is 
handled in a separate thread using a custom executor and eventually tries to 
resolve some bundle and ends up waiting for acquireGlobalLock indefinitely.

This is because at the same time, Felix calls refreshPackages which attempts to 
stop the bundle (while holding the lock), whose activator puts a cleanup task 
in its custom executor and then attempts to shut down the executor. This never 
happens, because the previous executor task initiated from removeService is 
waiting for the lock, hence the deadlock.

I'm not entirely sure which of the projects has the underlying bug in it - 
probably pax web, possibly Felix if the OSGi specs allow for the behavior that 
hangs it, but in any case Karaf is using these versions and exhibiting the 
deadlock, so at the very least should upgrade to fixed versions of these 
libraries, or patch them.

If anyone who knows these systems better thinks it should be reported in one of 
the upstream projects, point me in the right direction and I'll be happy to do 
it.


Here is the thread dump, the top two threads show the deadlock, and the other 
two are bundles which are stuck as well due to waiting for the same lock (I 
think).

"FelixFrameworkWiring" daemon prio=10 tid=0x00007f390002e000 nid=0x35d1 in 
Object.wait() [0x00007f3948dd3000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000f1b8ea50> (a java.lang.Object)
        at java.lang.Object.wait(Object.java:503)
        at 
org.ops4j.pax.web.service.internal.Executor.shutdown(Executor.java:91)
        - locked <0x00000000f1b8ea50> (a java.lang.Object)
        at org.ops4j.pax.web.service.internal.Activator.stop(Activator.java:140)
        at 
org.apache.felix.framework.util.SecureAction.stopActivator(SecureAction.java:667)
        at org.apache.felix.framework.Felix.stopBundle(Felix.java:2361)
        at org.apache.felix.framework.Felix$RefreshHelper.stop(Felix.java:4629)
        at org.apache.felix.framework.Felix.refreshPackages(Felix.java:3951)
        at 
org.apache.felix.framework.FrameworkWiringImpl.run(FrameworkWiringImpl.java:172)
        at java.lang.Thread.run(Thread.java:722)


"Pax Web Runtime worker" daemon prio=10 tid=0x00007f3904263000 nid=0x370a in 
Object.wait() [0x00007f390dfa8000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000e990e018> (a [Ljava.lang.Object;)
        at java.lang.Object.wait(Object.java:503)
        at org.apache.felix.framework.Felix.acquireGlobalLock(Felix.java:4944)
        - locked <0x00000000e990e018> (a [Ljava.lang.Object;)
        at 
org.apache.felix.framework.StatefulResolver.resolve(StatefulResolver.java:219)
        at 
org.apache.felix.framework.BundleWiringImpl.searchDynamicImports(BundleWiringImpl.java:1539)
        at 
org.apache.felix.framework.BundleWiringImpl.findClassOrResourceByDelegation(BundleWiringImpl.java:1439)
        at 
org.apache.felix.framework.BundleWiringImpl.access$400(BundleWiringImpl.java:72)
        at 
org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass(BundleWiringImpl.java:1843)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
        at 
org.apache.felix.framework.BundleWiringImpl.getClassByDelegation(BundleWiringImpl.java:1317)
        at 
org.apache.felix.framework.ServiceRegistrationImpl$ServiceReferenceImpl.isAssignableTo(ServiceRegistrationImpl.java:521)
        at 
org.apache.felix.framework.util.Util.isServiceAssignable(Util.java:280)
        at 
org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:916)
        at 
org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:793)
        at 
org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:543)
        at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4260)
        at org.apache.felix.framework.Felix.access$000(Felix.java:74)
        at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:390)
        at 
org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:148)
        at 
org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:127)
        at 
org.ops4j.pax.web.service.internal.Activator.updateController(Activator.java:231)
        at 
org.ops4j.pax.web.service.internal.Activator$DynamicsServiceTrackerCustomizer$2.run(Activator.java:387)
        at 
org.ops4j.pax.web.service.internal.Executor$Future.run(Executor.java:45)
        at 
org.ops4j.pax.web.service.internal.Executor$Worker.run(Executor.java:122)


"fileinstall-/opt/apache-karaf-2.3.1/deploy" daemon prio=10 
tid=0x00007f3904018800 nid=0x35a8 in Object.wait() [0x00007f394aba8000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000e990e018> (a [Ljava.lang.Object;)
        at java.lang.Object.wait(Object.java:503)
        at org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:4871)
        - locked <0x00000000e990e018> (a [Ljava.lang.Object;)
        at org.apache.felix.framework.Felix.startBundle(Felix.java:1744)
        at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:944)
        at 
org.apache.felix.fileinstall.internal.DirectoryWatcher.startBundle(DirectoryWatcher.java:1247)
        at 
org.apache.felix.fileinstall.internal.DirectoryWatcher.startBundles(DirectoryWatcher.java:1219)
        at 
org.apache.felix.fileinstall.internal.DirectoryWatcher.startAllBundles(DirectoryWatcher.java:1208)
        at 
org.apache.felix.fileinstall.internal.DirectoryWatcher.process(DirectoryWatcher.java:503)
        at 
org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:291)


"NioProcessor-2" prio=10 tid=0x00007f3914014000 nid=0x35fd in Object.wait() 
[0x00007f394a064000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000e990e018> (a [Ljava.lang.Object;)
        at java.lang.Object.wait(Object.java:503)
        at org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:4871)
        - locked <0x00000000e990e018> (a [Ljava.lang.Object;)
        at org.apache.felix.framework.Felix.startBundle(Felix.java:1744)
        at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:944)
        at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:931)
        at 
org.apache.karaf.features.internal.FeaturesServiceImpl.installFeatures(FeaturesServiceImpl.java:479)
        at 
org.apache.karaf.features.internal.FeaturesServiceImpl.installFeature(FeaturesServiceImpl.java:396)
        at 
org.apache.karaf.features.internal.FeaturesServiceImpl.installFeature(FeaturesServiceImpl.java:392)
        at 
org.apache.karaf.features.command.InstallFeatureCommand.doExecute(InstallFeatureCommand.java:62)
        at 
org.apache.karaf.features.command.FeaturesCommandSupport.doExecute(FeaturesCommandSupport.java:41)
        at 
org.apache.karaf.shell.console.OsgiCommandSupport.execute(OsgiCommandSupport.java:38)
        at 
org.apache.felix.gogo.commands.basic.AbstractCommand.execute(AbstractCommand.java:35)
        at 
org.apache.felix.gogo.runtime.CommandProxy.execute(CommandProxy.java:78)
        at org.apache.felix.gogo.runtime.Closure.executeCmd(Closure.java:474)
        at 
org.apache.felix.gogo.runtime.Closure.executeStatement(Closure.java:400)
        at org.apache.felix.gogo.runtime.Pipe.run(Pipe.java:108)
        at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:183)
        at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:120)
        at 
org.apache.felix.gogo.runtime.CommandSessionImpl.execute(CommandSessionImpl.java:89)
        at 
org.apache.karaf.shell.ssh.ShellCommandFactory$ShellCommand$1.run(ShellCommandFactory.java:109)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.karaf.shell.ssh.ShellCommandFactory$ShellCommand.start(ShellCommandFactory.java:107)
        at 
org.apache.sshd.server.channel.ChannelSession.handleExec(ChannelSession.java:388)
        at 
org.apache.sshd.server.channel.ChannelSession.handleRequest(ChannelSession.java:235)
        at 
org.apache.sshd.server.channel.ChannelSession.handleRequest(ChannelSession.java:195)
        at 
org.apache.sshd.common.session.AbstractSession.channelRequest(AbstractSession.java:1057)
        at 
org.apache.sshd.server.session.ServerSession.running(ServerSession.java:229)
        at 
org.apache.sshd.server.session.ServerSession.handleMessage(ServerSession.java:205)
        at 
org.apache.sshd.common.session.AbstractSession.decode(AbstractSession.java:566)
        at 
org.apache.sshd.common.session.AbstractSession.messageReceived(AbstractSession.java:236)
        - locked <0x00000000efd56b00> (a java.lang.Object)
        at 
org.apache.sshd.common.AbstractSessionIoHandler.messageReceived(AbstractSessionIoHandler.java:58)
        at 
org.apache.mina.core.filterchain.DefaultIoFilterChain$TailFilter.messageReceived(DefaultIoFilterChain.java:690)
        at 
org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
        at 
org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47)
        at 
org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765)
        at 
org.apache.mina.core.filterchain.IoFilterAdapter.messageReceived(IoFilterAdapter.java:109)
        at 
org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
        at 
org.apache.mina.core.filterchain.DefaultIoFilterChain.fireMessageReceived(DefaultIoFilterChain.java:410)
        at 
org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:710)
        at 
org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:664)
        at 
org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:653)
        at 
org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67)
        at 
org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1124)
        at 
org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)







--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to