Hi David, Sorry for the delay ...
Am 21.02.2012 um 23:56 schrieb David Jencks: > Hi Felix, > > Sorry for the delay, I've been struggling to come up with an independent way > to reproduce this and finally have something similar in an integration test > which I've attached to the jira with a stack trace. > > By messing with the timing I've gotten into a state where we're trying to log > on a BundleComponentActivator that has been shut down. The activity being > logged is different, but the NPE is the same. Admittedly I added the log > statement in question.... > > The basic scenario is two threads shutting stuff down concurrently, and an > unbind method that takes a long time to return. > > bundle A has a service s1 > bundle B has a service s2 with a reference to s1, and an unbind method that > takes a long time. > > Stop A in thread 1; the thread gets stuck in unbind of s2. > > Stop B in thread 2; this shuts down B so it's BundleComponentActivator no > longer can log (the NPE). > > If the unbind method in thread 1 now returns, it will try to log in B's BCA > which has no log service tracker. > > At the moment I'm inclined to think that there is no way to prevent these > kinds of races between bundles shutting down and the best thing to do is null > checks in many places and catching exceptions from attempts to use stopped > bundle contexts; this is what my first two patches do. Yes, I agree -- I've also done the null-check thing a couple of places. > > BTW to get my test to work at all I started upgrading scr to more recent pax > bundles. I think this would slightly simplify the code. Also I think the > more modern way to run integration tests is with the maven-failsafe-plugin > rather than a separate surefire plugin execution. Would you be interested in > a patch for this? Yes. I once tried upgrading but failed because I don't know too much of it and I used to use features, which are not directly supported any longer ... Regards Felix > > thanks > david jencks > > > On Feb 16, 2012, at 12:41 AM, Felix Meschberger wrote: > >> Hi David, >> >> Thanks for reporting. I have seen the reported issue. I will look into it >> ASAP. >> >> My guts feeling tells me that some issues are just a question of doing >> null-checks properly while others might be more involved ... >> >> I assume you have a scenario where you can reproduce reliably ? >> >> Regards >> Felix >> >> Am 15.02.2012 um 07:55 schrieb David Jencks: >> >>> see FELIX-3345 >>> >>> We've been seeing intermittent exceptions from SCR which generally seem to >>> look like trying to unget a service on a bundle context, >>> BundleComponentActivator, ComponentManager, or DependencyManager that are >>> shut down or being shut down. I think there are 2 threads shutting bundles >>> down at once. I'm not making much progress investigation exactly how this >>> happens so I'd really appreciate it if one of the experts could take a look >>> at the stack traces in the issue and attempt to guess whether there's a >>> real concurrency bug or if the situations we're seeing are expected when >>> more than one thread is shutting down bundles at once, and the "don't throw >>> an exception" patch I provided would be appropriate. >>> >>> To try to pique your interest here is one of the stack traces: >>> >>> Stack Dump = org.osgi.framework.ServiceException: Exception in >>> org.apache.felix.scr.impl.manager.DelayedComponentManager.ungetService() >>> at >>> org.eclipse.osgi.internal.serviceregistry.ServiceUse.releaseService(ServiceUse.java:287) >>> >>> at >>> org.eclipse.osgi.internal.serviceregistry.ServiceRegistrationImpl.releaseService(ServiceRegistrationImpl.java:562) >>> >>> at >>> org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.releaseServicesInUse(ServiceRegistry.java:665) >>> >>> at >>> org.eclipse.osgi.framework.internal.core.BundleContextImpl.close(BundleContextImpl.java:91) >>> >>> at >>> org.eclipse.osgi.framework.internal.core.BundleHost.stopWorker(BundleHost.java:514) >>> >>> at >>> org.eclipse.osgi.framework.internal.core.AbstractBundle.suspend(AbstractBundle.java:565) >>> >>> at >>> org.eclipse.osgi.framework.internal.core.Framework.suspendBundle(Framework.java:1161) >>> >>> at >>> org.eclipse.osgi.framework.internal.core.StartLevelManager.decFWSL(StartLevelManager.java:595) >>> >>> at >>> org.eclipse.osgi.framework.internal.core.StartLevelManager.doSetStartLevel(StartLevelManager.java:257) >>> >>> at >>> org.eclipse.osgi.framework.internal.core.StartLevelManager.shutdown(StartLevelManager.java:215) >>> >>> at >>> org.eclipse.osgi.framework.internal.core.InternalSystemBundle.suspend(InternalSystemBundle.java:284) >>> >>> at >>> org.eclipse.osgi.framework.internal.core.Framework.shutdown(Framework.java:691) >>> >>> at >>> org.eclipse.osgi.framework.internal.core.Framework.close(Framework.java:598) >>> >>> at >>> org.eclipse.osgi.framework.internal.core.InternalSystemBundle$1.run(InternalSystemBundle.java:261) >>> >>> at java.lang.Thread.run(Thread.java:680) >>> Caused by: java.lang.NullPointerException >>> at >>> org.apache.felix.scr.impl.BundleComponentActivator.log(BundleComponentActivator.java:614) >>> >>> at >>> org.apache.felix.scr.impl.BundleComponentActivator.log(BundleComponentActivator.java:589) >>> >>> at >>> org.apache.felix.scr.impl.manager.AbstractComponentManager.log(AbstractComponentManager.java:633) >>> >>> at >>> org.apache.felix.scr.impl.manager.AbstractComponentManager$State.log(AbstractComponentManager.java:1000) >>> >>> at >>> org.apache.felix.scr.impl.manager.AbstractComponentManager$State.ungetService(AbstractComponentManager.java:964) >>> >>> at >>> org.apache.felix.scr.impl.manager.DelayedComponentManager.ungetService(DelayedComponentManager.java:114) >>> >>> at >>> org.eclipse.osgi.internal.serviceregistry.ServiceUse$3.run(ServiceUse.java:277) >>> >>> at java.security.AccessController.doPrivileged(Native Method) >>> at >>> org.eclipse.osgi.internal.serviceregistry.ServiceUse.releaseService(ServiceUse.java:275) >>> >>> ... 14 more >>> >>> I think the DelayedComponentManager.State here is Disposed but we've also >>> seen this trace with state Active but the bundle context stopped (so >>> ungetting throws an exception). >>> >>> many thanks >>> david jencks >>> >> >