High contention (or deadlock) in PackageAdmin and StartLevel 
-------------------------------------------------------------

                 Key: FELIX-2400
                 URL: https://issues.apache.org/jira/browse/FELIX-2400
             Project: Felix
          Issue Type: Bug
          Components: Framework
    Affects Versions: framework-2.0.5
         Environment: Felix 2.0.5
java version "1.6.0_12"
Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
Java HotSpot(TM) 64-Bit Server VM (build 11.2-b01, mixed mode)
SunOS castor 5.10 Generic_138888-06 sun4u sparc SUNW,Sun-Fire-V890
            Reporter: Alexander Berger


Imagine the following code:

void createProblem(PackageAdmin pa, StartLevel sl, Bundle bundles[], int level){
   for ( final Bundle b : bundles) {
      sl.setBundleStartLevel(b, level);
   }
   pa.refreshPackages(null);
   pa.resolveBundles(null);
}

If there have been many bundles updated or uninstalled the code above might 
create what looks like a deadlock (see Stack traces below)
but in fact is a high contention problem. On our system (16 core Sun Sparcv9, 
64GB) with about 20 bundles (all updated, so refresh will be busy) 
this will result in very poor runtime performance, it will take about 30 to 60 
minutes for pa.resolveBundles(null) to return.

The problem lies in the asynchronous nature of 
setBundleStartLevel/refreshPackages and the way that Felix uses locking 
(acquireGlobalLock and acquireBundleLock). For example the following code works 
fine (and for pa.resolveBundles(null) returns within some seconds) but poses 
the problem of how to implement "magicWait":

void createNoProblem(PackageAdmin pa, StartLevel sl, Bundle bundles[], int 
level){
   for ( final Bundle b : bundles) {
      sl.setBundleStartLevel(b, level);
   }
   // wait until the asynchronous sl.setBundleStartLevel logic has finished
   magicWait(sl);
   pa.refreshPackages(null);
   // wait until the asynchronous pa.refreshPackages logic has finished
   magicWait(pa); 
   pa.resolveBundles(null);
}

At the moment I solved the problem by patching PackageAdminImpl like this (I 
know this is an ugly solution buts its only a show case):

public boolean isDone() {
   synchronized(this) {
      final Bundle tmp[][] = m_reqBundles;
      return tmp == null || tmp.length == 0;
   }
}

And implementing magicWait like this:

void magicWait(final PackageAdmin pa){
    final Method method = pa.getClass().getMethod("isDone");
    method.setAccessible(true);
    while ( ! (Boolean)method.invoke(pa) ) {
       Thread.yield();
    }
}

Then I did something similar for StartLevel. 

For me this patch/work around is fine for the moment but I think the problem 
should be investigated and solved in the Felix framework.




"FelixPackageAdmin" daemon prio=3 tid=0x00000001005ac800 nid=0x1a in 
Object.wait() [0xffffffff4f6fe000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0xffffffff554000e0> (a [Ljava.lang.Object;)
        at java.lang.Object.wait(Object.java:485)
        at org.apache.felix.framework.Felix.acquireGlobalLock(Felix.java:4535)
        - locked <0xffffffff554000e0> (a [Ljava.lang.Object;)
        at org.apache.felix.framework.Felix.refreshPackages(Felix.java:3314)
        at 
org.apache.felix.framework.PackageAdminImpl.run(PackageAdminImpl.java:331)
        at java.lang.Thread.run(Unknown Source)
   Locked ownable synchronizers:
        - None
        
"FelixStartLevel" daemon prio=3 tid=0x0000000100848000 nid=0x19 in 
Object.wait() [0xffffffff4f8fe000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0xffffffff554000e0> (a [Ljava.lang.Object;)
        at java.lang.Object.wait(Object.java:485)
        at org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:4462)
        - locked <0xffffffff554000e0> (a [Ljava.lang.Object;)
        at org.apache.felix.framework.Felix.setBundleStartLevel(Felix.java:1266)
        at 
org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:270)
        at java.lang.Thread.run(Unknown Source)
   Locked ownable synchronizers:
        - None
        
"OSKi" prio=3 tid=0x00000001006ea800 nid=0x1b runnable [0xffffffff4f4fd000]
   java.lang.Thread.State: RUNNABLE
        at 
org.apache.felix.framework.searchpolicy.ResolvedPackage.clone(ResolvedPackage.java:62)
        at 
org.apache.felix.framework.searchpolicy.Resolver.isClassSpaceConsistent(Resolver.java:846)
        at 
org.apache.felix.framework.searchpolicy.Resolver.isClassSpaceConsistent(Resolver.java:807)
        at 
org.apache.felix.framework.searchpolicy.Resolver.isClassSpaceConsistent(Resolver.java:807)
        at 
org.apache.felix.framework.searchpolicy.Resolver.findConsistentClassSpace(Resolver.java:549)
        at 
org.apache.felix.framework.searchpolicy.Resolver.resolve(Resolver.java:103)
        at 
org.apache.felix.framework.Felix$FelixResolver.resolve(Felix.java:3861)
        at org.apache.felix.framework.Felix.resolveBundle(Felix.java:3292)
        at org.apache.felix.framework.Felix.resolveBundles(Felix.java:3267)
        at 
org.apache.felix.framework.PackageAdminImpl.resolveBundles(PackageAdminImpl.java:288)
            at Test.createProblem(Test.java:10)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to