Hi Felix people,

I've come back to revisit this issue, where threads are deadblocking during classloading when running under JamVM and Felix 4.3.1. Having now completed other deployments, it's now critical for me to get this working ASAP, therefore any help you can offer is tremendously gratefully received.

I've removed a cyclic dependency, as Karl suggested, which I believe reduced the occurrence rate. I've also started to test without Retrotranslation, which hasn't had an effect.

A common element to the deadlock is SCR:

"FelixStartLevel" (daemon) 0x1aaf9a0 priority: 5 tid: 0x7f69ce3a8700 id: 6 state: BLOCKED (5)
at java.lang.ClassLoader.findLoadedClass(ClassLoader.java:580)
at org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass(BundleWiringImpl.java:1948)
at java.lang.ClassLoader.loadClass(ClassLoader.java:293)
at org.apache.felix.framework.BundleWiringImpl.getClassByDelegation(BundleWiringImpl.java:1374)
at org.apache.felix.framework.BundleWiringImpl.searchImports(BundleWiringImpl.java:1553)
at org.apache.felix.framework.BundleWiringImpl.findClassOrResourceByDelegation(BundleWiringImpl.java:1484)
at org.apache.felix.framework.BundleWiringImpl.access$400(BundleWiringImpl.java:75)
at org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass(BundleWiringImpl.java:1955)
at java.lang.ClassLoader.loadClass(ClassLoader.java:293)
at java.lang.VMClass.getDeclaredMethods(Native method)
at java.lang.Class.getDeclaredMethods(Class.java:536)
at java.lang.Class.getDeclaredMethod(Class.java:506)
at org.apache.felix.scr.impl.helper.BaseMethod.getMethod(BaseMethod.java:328)
at org.apache.felix.scr.impl.helper.BindMethod.getMethod(BindMethod.java:38)
at org.apache.felix.scr.impl.helper.BindMethod.getServiceReferenceMethod(BindMethod.java:350)
at org.apache.felix.scr.impl.helper.BindMethod.doFindMethod(BindMethod.java:99)
at org.apache.felix.scr.impl.helper.BaseMethod.findMethod(BaseMethod.java:185)
at org.apache.felix.scr.impl.helper.BaseMethod.access$400(BaseMethod.java:37)
at org.apache.felix.scr.impl.helper.BaseMethod$NotResolved.resolve(BaseMethod.java:555)
at org.apache.felix.scr.impl.helper.BaseMethod$NotResolved.methodExists(BaseMethod.java:578)
at org.apache.felix.scr.impl.helper.BaseMethod.methodExists(BaseMethod.java:510)
at org.apache.felix.scr.impl.helper.BindMethod.methodExists(BindMethod.java:38)
at org.apache.felix.scr.impl.helper.BindMethod.getServiceObject(BindMethod.java:560)
at org.apache.felix.scr.impl.manager.DependencyManager.prebind(DependencyManager.java:1091)
at org.apache.felix.scr.impl.manager.AbstractComponentManager.collectDependencies(AbstractComponentManager.java:768)
at org.apache.felix.scr.impl.manager.AbstractComponentManager$Unsatisfied.activate(AbstractComponentManager.java:1492)
at org.apache.felix.scr.impl.manager.AbstractComponentManager.activateInternal(AbstractComponentManager.java:550)
at org.apache.felix.scr.impl.manager.AbstractComponentManager.enable(AbstractComponentManager.java:261)
at org.apache.felix.scr.impl.config.ImmediateComponentHolder.enableComponents(ImmediateComponentHolder.java:328)
at org.apache.felix.scr.impl.BundleComponentActivator.initialize(BundleComponentActivator.java:158)
at org.apache.felix.scr.impl.BundleComponentActivator.<init>(BundleComponentActivator.java:113)
at org.apache.felix.scr.impl.Activator.loadComponents(Activator.java:261)
at org.apache.felix.scr.impl.Activator.loadAllComponents(Activator.java:199)
at org.apache.felix.scr.impl.Activator.start(Activator.java:108)
at org.apache.felix.framework.util.SecureAction.startActivator(SecureAction.java:645)
at org.apache.felix.framework.Felix.activateBundle(Felix.java:2146)
at org.apache.felix.framework.Felix.startBundle(Felix.java:2064)
at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1291)
at org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:304)
at java.lang.Thread.run(Thread.java:745)

After taking this trace, I changed SCR to start before the bundles it should load, the deadlock still happens with a similar stack but now from org.apache.felix.scr.impl.Activator.bundleChanged. In fact, having tried to avoid the issue by using start levels and loading bundles in a natural order (previously everything was getting started via auto-deploy), the deadlock occurs every time (whereas it was down to more like 1 in 20 with auto-deploy).

A couple of examples of the other blocked threads are attached.

"Agent Heartbeat" (daemon) 0x7fc5a8098bf0 priority: 5 tid: 0x7fc5a67fc700 id: 
12 state: BLOCKED (5)
        at 
org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.findClass(BundleWiringImpl.java:2038)
        at 
org.apache.felix.framework.BundleWiringImpl.findClassOrResourceByDelegation(BundleWiringImpl.java:1501)
        at 
org.apache.felix.framework.BundleWiringImpl.access$400(BundleWiringImpl.java:75)
        at 
org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass(BundleWiringImpl.java:1955)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:293)
        at 
com.rabbitmq.client.MessageProperties.<clinit>(MessageProperties.java:29)
        at com.rabbitmq.client.impl.ChannelN.basicPublish(ChannelN.java:634)
        at com.rabbitmq.client.impl.ChannelN.basicPublish(ChannelN.java:619)
        at com.rabbitmq.client.impl.ChannelN.basicPublish(ChannelN.java:610)
        at 
com.redbite.rededge.obs.heartbeat.agent.AgentHeartbeat$2.run(AgentHeartbeat.java:39)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:440)
        at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:316)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:149)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$1(ScheduledThreadPoolExecutor.java:1)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:679)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:704)
        at java.lang.Thread.run(Thread.java:745)
"AMQP Connection 192.168.1.251:5672" (daemon) 0x7f69b40839e0 priority: 5 tid: 
0x7f69cd1a4700 id: 8 state: BLOCKED (5)
        at 
org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.findClass(BundleWiringImpl.java:2038)
        at 
org.apache.felix.framework.BundleWiringImpl.findClassOrResourceByDelegation(BundleWiringImpl.java:1501)
        at 
org.apache.felix.framework.BundleWiringImpl.access$400(BundleWiringImpl.java:75)
        at 
org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass(BundleWiringImpl.java:1955)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:293)
        at java.lang.VMClassLoader.defineClass(Native method)
        at 
java.lang.VMClassLoader.defineClassWithTransformers(VMClassLoader.java:343)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:471)
        at 
org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.findClass(BundleWiringImpl.java:2279)
        at 
org.apache.felix.framework.BundleWiringImpl.findClassOrResourceByDelegation(BundleWiringImpl.java:1501)
        at 
org.apache.felix.framework.BundleWiringImpl.access$400(BundleWiringImpl.java:75)
        at 
org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass(BundleWiringImpl.java:1955)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:293)
        at com.rabbitmq.client.impl.AMQImpl.readMethodFrom(AMQImpl.java:3274)
        at 
com.rabbitmq.client.impl.CommandAssembler.consumeMethodFrame(CommandAssembler.java:93)
        at 
com.rabbitmq.client.impl.CommandAssembler.handleFrame(CommandAssembler.java:158)
        at com.rabbitmq.client.impl.AMQCommand.handleFrame(AMQCommand.java:87)
        at com.rabbitmq.client.impl.AMQChannel.handleFrame(AMQChannel.java:89)
        at 
com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:533)

In every case, FelixStartLevel blocks at ClassLoader.java:580 (via SCR and BundleWiringImpl.java:1948 loadClass) and BundleWiringImpl.java:2038 findClass (though that line seems inconspicuous - if (!hooks.isEmpty()), I don't readily see how a block could happen there - hooks is a TreeMap?). I added some debug output to BundleWiringImpl.java findClass and loadClass methods (no debugger with JamVM):


        @Override
        protected Class loadClass(final String name, final boolean resolve)
            throws ClassNotFoundException
        {
        System.out.println(getBundle().getSymbolicName() +" loadClass "+ name +", "+ resolve);

------


        @Override
        protected Class findClass(final String name) throws ClassNotFoundException
        {
        System.out.println(getBundle().getSymbolicName() +" findClass "+ name);



The output I get, leads me to believe that the deadlock is not particular to a bundle or class. For example (the last findClass and loadClass highlighted):

...
org.apache.felix.gogo.command findClass org.osgi.service.log.LogEntry obs.logging.amqp loadClass com.redbite.rededge.obs.interfaces.deployment.DeviceRegistry, false obs.interfaces.deployment loadClass com.redbite.rededge.obs.interfaces.deployment.DeviceRegistry, false obs.logging.amqp loadClass com.rabbitmq.client.Channel, false com.redbite.osgi.com.rabbitmq.amqp-client loadClass com.rabbitmq.client.Channel, false


Do you think I am interpreting this correctly? Now though, as well as blocking on [what used to be] line 2038 (if (!hooks.isEmpty())), it actually now  blocks on the line I've added, System.out.println(getBundle().getSymbolicName() +" findClass "+ name);

Any ideas on what I can do to pursue this further, gratefully received.
Best, Dan.

On 12 Nov 2013, at 11:18, Karl Pauls wrote:

On the retrotranslator bundles, they are made available by specifying a
dynamic-import which is added to the manifest during retrotranslation (so
that our code/libraries can run on).


I would try to change that and either have that bundles each contain their
own copy or to not have them mention it at all in the manifest and just put
them on the class path of the framework and boot delegate.


Unfortunately, running unretrotranslated does not help.


Hm, in that case something else must be going on - however, in your stack
trace it clearly all happens inside the retrotranslate part so you might
have to give me a stack trace of the unretrotranslated one as well.

regards,

Karl

p.s.: feel free to also share your manifests if that is possible


Thanks a lot for your help,
Dan.

On 12 Nov 2013, at 09:51, Karl Pauls wrote:

Well, my guess would be that you are running into some form of a circular
class load issue. The point is that the class loader in java is
synchronized which my lead to deadlocks if you have a cycle in your class
loaders. This is a long standing issue with frameworks like OSGi which
allow for cycles in the dependencies.

I don't know enough about your app to say for sure nor to help you with
this but look at the dependencies of your bundles and see whether they
create a cycle (i.e., Bundle A depends via some import on Bundle B which
depends via some import on Bundle C which depends via some import on
Bundle
A or something similar). In your case, it might be even more complicated
as
the retrotranslator seems to do some class loading magic by itself which
might be responsible for the issue - how do you make the
net.sf.retrotranslator.*
packages available to the bundles?

regards,

Karl


On Fri, Nov 8, 2013 at 10:44 PM, Daniel McGreal <[email protected]>
wrote:

Hello Felix developer community,

I am having an interesting (and critical) problem running my application
with Felix in an embedded environment where an alternate JVM, JamVM is
necessary.

The deadlock is non-deterministic with the deadlocked stack traces
blocking at odd places (though they might not seem odd to people used to
messing with class loading - but a block on new XXX() to me is quite the
novelty). An example of the relevant parts of the thread stack is in
FelixJamVMDeadlockStackTrace.txt, attached. It shows a lock between:


 -
 - java.lang.ClassLoader.findLoadedClass(ClassLoader.java:580) looks
 like:

   protected final synchronized Class<?> findLoadedClass(String name)
  {
     checkInitialized();
     return VMClassLoader.findLoadedClass(this, name);
  }

 where line 580 is checkInitialized() which throws an exception if
 isInitialized is false.

 -
net.sf.retrotranslator.runtime.asm.ClassReader.readConst(ClassReader.java:1591)
 looks like:

     public Object readConst(final int item, final char[] buf) {
         int index = items[item];
         switch (b[index - 1]) {
             case ClassWriter.INT:
                 return new Integer(readInt(index));
             case ClassWriter.FLOAT:
                 return new
Float(Float.intBitsToFloat(readInt(index)));
             case ClassWriter.LONG:
                 return new Long(readLong(index));
             case ClassWriter.DOUBLE:
                 return
newDouble(Double.longBitsToDouble(readLong(index)));
             case ClassWriter.CLASS:
                 String s = readUTF8(index, buf);
                 return Type.getType(s.charAt(0) == '[' ? s : "L" + s +
 ";");
             // case ClassWriter.STR:
             default:
                 return readUTF8(index, buf);
         }
     }
 where line 1591 is return new Integer(readInt(index));

 -
 -
 -
org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.findClass(BundleWiringImpl.java:2038)
 is a big method, where line 2038 looks inconspicuously like:

 if (!hooks.isEmpty())


Gogo and Retrotranslator are red herrings, without them in the
application
the deadlock just happens somewhere else.

I don't even vaguely understand how, but neither Eclipse nor
Knopflerfish
suffer from this problem. I would, however, much rather use Felix.

Please let me know if there's any further information I can provide.

Best, Dan.








--
Karl Pauls
[email protected]
http://twitter.com/karlpauls
http://www.linkedin.com/in/karlpauls
https://profiles.google.com/karlpauls




--
Karl Pauls
[email protected]
http://twitter.com/karlpauls
http://www.linkedin.com/in/karlpauls
https://profiles.google.com/karlpauls

Reply via email to