Hi David,

Sorry for the delay ...

Am 21.02.2012 um 23:56 schrieb David Jencks:

> Hi Felix,
> 
> Sorry for the delay, I've been struggling to come up with an independent way 
> to reproduce this and finally have something similar in an integration test 
> which I've attached to the jira with a stack trace.
> 
> By messing with the timing I've gotten into a state where we're trying to log 
> on a BundleComponentActivator that has been shut down.  The activity being 
> logged is different, but the NPE is the same.  Admittedly I added the log 
> statement in question....
> 
> The basic scenario is two threads shutting stuff down concurrently, and an 
> unbind method that takes a long time to return.
> 
> bundle A has a service s1
> bundle B has a service s2 with a reference to s1, and an unbind method that 
> takes a long time.
> 
> Stop A in thread 1; the thread gets stuck in unbind of s2.
> 
> Stop B in thread 2; this shuts down B so it's BundleComponentActivator no 
> longer can log (the NPE).
> 
> If the unbind method in thread 1 now returns, it will try to log in B's BCA 
> which has no log service tracker.
> 
> At the moment I'm inclined to think that there is no way to prevent these 
> kinds of races between bundles shutting down and the best thing to do is null 
> checks in many places and catching exceptions from attempts to use stopped 
> bundle contexts; this is what my first two patches do.

Yes, I agree -- I've also done the null-check thing a couple of places.

> 
> BTW to get my test to work at all I started upgrading scr to more recent pax 
> bundles.  I think this would slightly simplify the code.  Also I think the 
> more modern way to run integration tests is with the maven-failsafe-plugin 
> rather than a separate surefire plugin execution.  Would you be interested in 
> a patch for this?

Yes. I once tried upgrading but failed because I don't know too much of it and 
I used to use features, which are not directly supported any longer ...

Regards
Felix

> 
> thanks
> david jencks
> 
> 
> On Feb 16, 2012, at 12:41 AM, Felix Meschberger wrote:
> 
>> Hi David,
>> 
>> Thanks for reporting. I have seen the reported issue. I will look into it 
>> ASAP.
>> 
>> My guts feeling tells me that some issues are just a question of doing 
>> null-checks properly while others might be more involved ...
>> 
>> I assume you have a scenario where you can reproduce reliably ?
>> 
>> Regards
>> Felix
>> 
>> Am 15.02.2012 um 07:55 schrieb David Jencks:
>> 
>>> see FELIX-3345
>>> 
>>> We've been seeing intermittent exceptions from SCR which generally seem to 
>>> look like trying to unget a service on a bundle context, 
>>> BundleComponentActivator, ComponentManager, or DependencyManager that are 
>>> shut down or being shut down.  I think there are 2 threads shutting bundles 
>>> down at once.  I'm not making much progress investigation exactly how this 
>>> happens so I'd really appreciate it if one of the experts could take a look 
>>> at the stack traces in the issue and attempt to guess whether there's a 
>>> real concurrency bug or if the situations we're seeing are expected when 
>>> more than one thread is shutting down bundles at once, and the "don't throw 
>>> an exception" patch I provided would be appropriate.
>>> 
>>> To try to pique your interest here is one of the stack traces:
>>> 
>>> Stack Dump = org.osgi.framework.ServiceException: Exception in 
>>> org.apache.felix.scr.impl.manager.DelayedComponentManager.ungetService() 
>>> at 
>>> org.eclipse.osgi.internal.serviceregistry.ServiceUse.releaseService(ServiceUse.java:287)
>>>  
>>> at 
>>> org.eclipse.osgi.internal.serviceregistry.ServiceRegistrationImpl.releaseService(ServiceRegistrationImpl.java:562)
>>>  
>>> at 
>>> org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.releaseServicesInUse(ServiceRegistry.java:665)
>>>  
>>> at 
>>> org.eclipse.osgi.framework.internal.core.BundleContextImpl.close(BundleContextImpl.java:91)
>>>  
>>> at 
>>> org.eclipse.osgi.framework.internal.core.BundleHost.stopWorker(BundleHost.java:514)
>>>  
>>> at 
>>> org.eclipse.osgi.framework.internal.core.AbstractBundle.suspend(AbstractBundle.java:565)
>>>  
>>> at 
>>> org.eclipse.osgi.framework.internal.core.Framework.suspendBundle(Framework.java:1161)
>>>  
>>> at 
>>> org.eclipse.osgi.framework.internal.core.StartLevelManager.decFWSL(StartLevelManager.java:595)
>>>  
>>> at 
>>> org.eclipse.osgi.framework.internal.core.StartLevelManager.doSetStartLevel(StartLevelManager.java:257)
>>>  
>>> at 
>>> org.eclipse.osgi.framework.internal.core.StartLevelManager.shutdown(StartLevelManager.java:215)
>>>  
>>> at 
>>> org.eclipse.osgi.framework.internal.core.InternalSystemBundle.suspend(InternalSystemBundle.java:284)
>>>  
>>> at 
>>> org.eclipse.osgi.framework.internal.core.Framework.shutdown(Framework.java:691)
>>>  
>>> at 
>>> org.eclipse.osgi.framework.internal.core.Framework.close(Framework.java:598)
>>>  
>>> at 
>>> org.eclipse.osgi.framework.internal.core.InternalSystemBundle$1.run(InternalSystemBundle.java:261)
>>>  
>>> at java.lang.Thread.run(Thread.java:680) 
>>> Caused by: java.lang.NullPointerException 
>>> at 
>>> org.apache.felix.scr.impl.BundleComponentActivator.log(BundleComponentActivator.java:614)
>>>  
>>> at 
>>> org.apache.felix.scr.impl.BundleComponentActivator.log(BundleComponentActivator.java:589)
>>>  
>>> at 
>>> org.apache.felix.scr.impl.manager.AbstractComponentManager.log(AbstractComponentManager.java:633)
>>>  
>>> at 
>>> org.apache.felix.scr.impl.manager.AbstractComponentManager$State.log(AbstractComponentManager.java:1000)
>>>  
>>> at 
>>> org.apache.felix.scr.impl.manager.AbstractComponentManager$State.ungetService(AbstractComponentManager.java:964)
>>>  
>>> at 
>>> org.apache.felix.scr.impl.manager.DelayedComponentManager.ungetService(DelayedComponentManager.java:114)
>>>  
>>> at 
>>> org.eclipse.osgi.internal.serviceregistry.ServiceUse$3.run(ServiceUse.java:277)
>>>  
>>> at java.security.AccessController.doPrivileged(Native Method) 
>>> at 
>>> org.eclipse.osgi.internal.serviceregistry.ServiceUse.releaseService(ServiceUse.java:275)
>>>  
>>> ... 14 more 
>>> 
>>> I think the DelayedComponentManager.State here is Disposed but we've also 
>>> seen this trace with state Active but the bundle context stopped (so 
>>> ungetting throws an exception).
>>> 
>>> many thanks
>>> david jencks
>>> 
>> 
> 

Reply via email to