[
https://issues.apache.org/jira/browse/KARAF-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619610#comment-13619610
]
Amichai Rothman commented on KARAF-2256:
----------------------------------------
Great point. I've switched to equinox, and in the fiddling I did since then I
couldn't recreate a deadlock. Although this is inconclusive proof (as race
conditions and deadlocks often are), Felix is now looking like the main suspect.
> Deadlock when refreshing bundles
> --------------------------------
>
> Key: KARAF-2256
> URL: https://issues.apache.org/jira/browse/KARAF-2256
> Project: Karaf
> Issue Type: Bug
> Affects Versions: 2.3.1
> Environment: 64-bit Linux Oracle JDK 1.7.0_17
> Reporter: Amichai Rothman
> Assignee: Achim Nierbeck
>
> When attempting to install the DOSGi feature (by running "features:chooseurl
> cxf-dosgi 1.4.0" and "features:install cxf-dosgi-discovery-distributed"), the
> installation hangs along with some of the bundles which can no longer be
> started, stopped, checked for imports, etc. - the Karaf server must be killed
> and restarted to resume. This is likely not related to this specific feature,
> and can happen with other refreshed bundles and installed features as well.
> At a glance it seems like this is caused by the "OPS4J Pax Web - Runtime
> (1.1.12)" bundle being stuck in the stopping state due to a deadlock caused
> by its Activator:
> It receives a removedService notification from a service tracker, which is
> handled in a separate thread using a custom executor and eventually tries to
> resolve some bundle and ends up waiting for acquireGlobalLock indefinitely.
> This is because at the same time, Felix calls refreshPackages which attempts
> to stop the bundle (while holding the lock), whose activator puts a cleanup
> task in its custom executor and then attempts to shut down the executor. This
> never happens, because the previous executor task initiated from
> removeService is waiting for the lock, hence the deadlock.
> I'm not entirely sure which of the projects has the underlying bug in it -
> probably pax web, possibly Felix if the OSGi specs allow for the behavior
> that hangs it, but in any case Karaf is using these versions and exhibiting
> the deadlock, so at the very least should upgrade to fixed versions of these
> libraries, or patch them.
> If anyone who knows these systems better thinks it should be reported in one
> of the upstream projects, point me in the right direction and I'll be happy
> to do it.
> Here is the thread dump, the top two threads show the deadlock, and the other
> two are bundles which are stuck as well due to waiting for the same lock (I
> think).
> "FelixFrameworkWiring" daemon prio=10 tid=0x00007f390002e000 nid=0x35d1 in
> Object.wait() [0x00007f3948dd3000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000000f1b8ea50> (a java.lang.Object)
> at java.lang.Object.wait(Object.java:503)
> at
> org.ops4j.pax.web.service.internal.Executor.shutdown(Executor.java:91)
> - locked <0x00000000f1b8ea50> (a java.lang.Object)
> at
> org.ops4j.pax.web.service.internal.Activator.stop(Activator.java:140)
> at
> org.apache.felix.framework.util.SecureAction.stopActivator(SecureAction.java:667)
> at org.apache.felix.framework.Felix.stopBundle(Felix.java:2361)
> at
> org.apache.felix.framework.Felix$RefreshHelper.stop(Felix.java:4629)
> at org.apache.felix.framework.Felix.refreshPackages(Felix.java:3951)
> at
> org.apache.felix.framework.FrameworkWiringImpl.run(FrameworkWiringImpl.java:172)
> at java.lang.Thread.run(Thread.java:722)
> "Pax Web Runtime worker" daemon prio=10 tid=0x00007f3904263000 nid=0x370a in
> Object.wait() [0x00007f390dfa8000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000000e990e018> (a [Ljava.lang.Object;)
> at java.lang.Object.wait(Object.java:503)
> at org.apache.felix.framework.Felix.acquireGlobalLock(Felix.java:4944)
> - locked <0x00000000e990e018> (a [Ljava.lang.Object;)
> at
> org.apache.felix.framework.StatefulResolver.resolve(StatefulResolver.java:219)
> at
> org.apache.felix.framework.BundleWiringImpl.searchDynamicImports(BundleWiringImpl.java:1539)
> at
> org.apache.felix.framework.BundleWiringImpl.findClassOrResourceByDelegation(BundleWiringImpl.java:1439)
> at
> org.apache.felix.framework.BundleWiringImpl.access$400(BundleWiringImpl.java:72)
> at
> org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass(BundleWiringImpl.java:1843)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> at
> org.apache.felix.framework.BundleWiringImpl.getClassByDelegation(BundleWiringImpl.java:1317)
> at
> org.apache.felix.framework.ServiceRegistrationImpl$ServiceReferenceImpl.isAssignableTo(ServiceRegistrationImpl.java:521)
> at
> org.apache.felix.framework.util.Util.isServiceAssignable(Util.java:280)
> at
> org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:916)
> at
> org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:793)
> at
> org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:543)
> at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4260)
> at org.apache.felix.framework.Felix.access$000(Felix.java:74)
> at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:390)
> at
> org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:148)
> at
> org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:127)
> at
> org.ops4j.pax.web.service.internal.Activator.updateController(Activator.java:231)
> at
> org.ops4j.pax.web.service.internal.Activator$DynamicsServiceTrackerCustomizer$2.run(Activator.java:387)
> at
> org.ops4j.pax.web.service.internal.Executor$Future.run(Executor.java:45)
> at
> org.ops4j.pax.web.service.internal.Executor$Worker.run(Executor.java:122)
> "fileinstall-/opt/apache-karaf-2.3.1/deploy" daemon prio=10
> tid=0x00007f3904018800 nid=0x35a8 in Object.wait() [0x00007f394aba8000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000000e990e018> (a [Ljava.lang.Object;)
> at java.lang.Object.wait(Object.java:503)
> at org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:4871)
> - locked <0x00000000e990e018> (a [Ljava.lang.Object;)
> at org.apache.felix.framework.Felix.startBundle(Felix.java:1744)
> at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:944)
> at
> org.apache.felix.fileinstall.internal.DirectoryWatcher.startBundle(DirectoryWatcher.java:1247)
> at
> org.apache.felix.fileinstall.internal.DirectoryWatcher.startBundles(DirectoryWatcher.java:1219)
> at
> org.apache.felix.fileinstall.internal.DirectoryWatcher.startAllBundles(DirectoryWatcher.java:1208)
> at
> org.apache.felix.fileinstall.internal.DirectoryWatcher.process(DirectoryWatcher.java:503)
> at
> org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:291)
> "NioProcessor-2" prio=10 tid=0x00007f3914014000 nid=0x35fd in Object.wait()
> [0x00007f394a064000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000000e990e018> (a [Ljava.lang.Object;)
> at java.lang.Object.wait(Object.java:503)
> at org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:4871)
> - locked <0x00000000e990e018> (a [Ljava.lang.Object;)
> at org.apache.felix.framework.Felix.startBundle(Felix.java:1744)
> at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:944)
> at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:931)
> at
> org.apache.karaf.features.internal.FeaturesServiceImpl.installFeatures(FeaturesServiceImpl.java:479)
> at
> org.apache.karaf.features.internal.FeaturesServiceImpl.installFeature(FeaturesServiceImpl.java:396)
> at
> org.apache.karaf.features.internal.FeaturesServiceImpl.installFeature(FeaturesServiceImpl.java:392)
> at
> org.apache.karaf.features.command.InstallFeatureCommand.doExecute(InstallFeatureCommand.java:62)
> at
> org.apache.karaf.features.command.FeaturesCommandSupport.doExecute(FeaturesCommandSupport.java:41)
> at
> org.apache.karaf.shell.console.OsgiCommandSupport.execute(OsgiCommandSupport.java:38)
> at
> org.apache.felix.gogo.commands.basic.AbstractCommand.execute(AbstractCommand.java:35)
> at
> org.apache.felix.gogo.runtime.CommandProxy.execute(CommandProxy.java:78)
> at org.apache.felix.gogo.runtime.Closure.executeCmd(Closure.java:474)
> at
> org.apache.felix.gogo.runtime.Closure.executeStatement(Closure.java:400)
> at org.apache.felix.gogo.runtime.Pipe.run(Pipe.java:108)
> at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:183)
> at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:120)
> at
> org.apache.felix.gogo.runtime.CommandSessionImpl.execute(CommandSessionImpl.java:89)
> at
> org.apache.karaf.shell.ssh.ShellCommandFactory$ShellCommand$1.run(ShellCommandFactory.java:109)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.karaf.shell.ssh.ShellCommandFactory$ShellCommand.start(ShellCommandFactory.java:107)
> at
> org.apache.sshd.server.channel.ChannelSession.handleExec(ChannelSession.java:388)
> at
> org.apache.sshd.server.channel.ChannelSession.handleRequest(ChannelSession.java:235)
> at
> org.apache.sshd.server.channel.ChannelSession.handleRequest(ChannelSession.java:195)
> at
> org.apache.sshd.common.session.AbstractSession.channelRequest(AbstractSession.java:1057)
> at
> org.apache.sshd.server.session.ServerSession.running(ServerSession.java:229)
> at
> org.apache.sshd.server.session.ServerSession.handleMessage(ServerSession.java:205)
> at
> org.apache.sshd.common.session.AbstractSession.decode(AbstractSession.java:566)
> at
> org.apache.sshd.common.session.AbstractSession.messageReceived(AbstractSession.java:236)
> - locked <0x00000000efd56b00> (a java.lang.Object)
> at
> org.apache.sshd.common.AbstractSessionIoHandler.messageReceived(AbstractSessionIoHandler.java:58)
> at
> org.apache.mina.core.filterchain.DefaultIoFilterChain$TailFilter.messageReceived(DefaultIoFilterChain.java:690)
> at
> org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
> at
> org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47)
> at
> org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765)
> at
> org.apache.mina.core.filterchain.IoFilterAdapter.messageReceived(IoFilterAdapter.java:109)
> at
> org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
> at
> org.apache.mina.core.filterchain.DefaultIoFilterChain.fireMessageReceived(DefaultIoFilterChain.java:410)
> at
> org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:710)
> at
> org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:664)
> at
> org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:653)
> at
> org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67)
> at
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1124)
> at
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira