Hi, I recently got a number of stacktraces, where a variable number of threads is blocked on "m_bundleLock.wait()" inside Felix.acquireBundleLock. From pure code inspection, it should not possible for the framework to stall there.
It seems that Guillaume Nodet also hit this problem and added code to throw an IllegalStateException if the wait() call is interrupted (FELIX-2784 [1]). Other times I hit a similar issue in the Felix.acquireGlobalLock method for which I reported FELIX-3067 [2]. It looks like both lock acquisition methods are prone to some kind of deadlock. The acquireGlobalLock method has the ability to fail by reporting such failure with a return code. The acquireBundleLock has no ability to fail (except throwing an exception). I think similar to my FELIX-3067 proposal the acquireBundleLock method should only wait a limited time and then retry acquiring the lock. Only if this fails for a number of times, acquisition should be aborted and the method fail with an IllegalStateException. WDYT ? Regards Felix PS: We are still running 3.0.8 but the acquireBundleLock and acquireGlobalLock are the same as in trunk (except for Guillaume's FELIX-2784 fix). [1] https://issues.apache.org/jira/browse/FELIX-2784 [2] https://issues.apache.org/jira/browse/FELIX-3067