David, I just attached the stacktrace to the FELIX-4297
On Tue, Oct 29, 2013 at 4:46 PM, Pierre De Rop <[email protected]>wrote: > yes David, consistently. but I don't see stacktraces. > > Now, looking at the code, I see that this stacktrace is logged in DEBUG: > > final void dumpThreads() > { > try > { > String dump = new ThreadDump().call(); > log( LogService.LOG_DEBUG, dump, null ); > > So, I will activate DEBUG and provide the stacktrace. > > But is this a bug to log this stacktrace in DEBUG ? > I mean: should it be logged in WARN, instead of DEBUG ? > > /pierre > > > On Tue, Oct 29, 2013 at 4:28 PM, David Jencks <[email protected]>wrote: > >> Hi Pierre, >> >> I think there should be a thread dump in the log to go with this message. >> If there is could you send it to me or attach it to FELIX-4297? >> >> Does this happen consistently? >> >> many thanks >> david jencks >> >> On Oct 29, 2013, at 3:43 AM, Pierre De Rop <[email protected]> >> wrote: >> >> > Hi David; >> > >> > 1) Every integration tests are passing OK. >> > >> > Regarding the OOM, I believe that the out of memory is likely to be >> cause >> > by the following load tests, which are running in DEBUG mode: >> > >> > >> src/test/java/org/apache/felix/scr/integration/ComponentConcurrencyTest.java >> > src/test/java/org/apache/felix/scr/integration/Felix3680Test.java >> > src/test/java/org/apache/felix/scr/integration/Felix3680_2Test.java >> > >> > >> > So, I just added this in each files (in static initializers): >> > >> > DS_LOGLEVEL = "warn"; >> > >> > And I have no more OOM, with this nice test result message: >> > >> > Tests run: 103, Failures: 0, Errors: 0, Skipped: 0 >> > [INFO] BUILD SUCCESS >> > >> > >> > >> > 2) Now, I did again a non regression tests with our applications, and >> I'm >> > seeing a new WARN message (but it does not seem to prevent the >> application >> > to be correctly initialized: all components are in the end satisfied): >> > >> > (if you believe that it's now time to release, then you can go ahead, >> and >> > we'll then investigate this new issue after the release ... I let you >> > decide) >> > >> > -> >> > >> > 2013-10-29 11:10:02,487 CM Event Dispatcher (Fire ConfigurationEvent: >> > pid=com.alcatel.as.ioh.impl.server.ServerFactoryImpl) ERROR osgi - >> > [com.alcatel.as.ioh.impl.server.ServerFactoryImpl(91)] >> DependencyManager : >> > invokeBindMethod : timeout on open latch newtcpprocessor >> > >> > >> > >> > >> > kind regards; >> > /Pierre >> > >> > >> > On Tue, Oct 29, 2013 at 12:13 AM, David Jencks <[email protected] >> >wrote: >> > >> >> Pierre, >> >> >> >> I opened https://issues.apache.org/jira/browse/FELIX-4297 and fixed >> the >> >> problems I found (for 2). I don't see the OOM often enough to have any >> >> confidence that anything I do would actually fix it, so I'm inclined >> to do >> >> nothing. Is that OK with you? >> >> >> >> Unless you can find some more problems :-) I'm planning to try another >> >> release when the config admin 1.8 gets to maven central. I'm going to >> >> update the pom to normally run against the CA 1.8 version supporting >> R5 and >> >> change the profile so running against R4 requires specifying profiles >> >> explicitly. >> >> >> >> thanks again! >> >> david jencks >> >> >> >> On Oct 28, 2013, at 12:24 AM, David Jencks <[email protected]> >> wrote: >> >> >> >>> Hi Pierre, >> >>> >> >>> Much better to find these problems before a release than just after! >> >>> >> >>> I saw an OOM once recently but haven't been able to reproduce it. >> >>> >> >>> I'm looking into the NPE. I think I see the timing hole it is using >> but >> >> need to think about it some more. >> >>> >> >>> many thanks! >> >>> david jencks >> >>> >> >>> On Oct 27, 2013, at 2:58 AM, Pierre De Rop <[email protected]> >> >> wrote: >> >>> >> >>>> Hi David, >> >>>> >> >>>> Looking at our configurator component we are currently using (but we >> >> will fix it in order to use the multi-location "?"), I see this: >> >>>> >> >>>> void configure(String pid, Dictionary pidConf) { >> >>>> Configuration config = getConfiguration(_pid, null); >> >>>> if (config.getBundleLocation() != null) { >> >>>> config.setBundleLocation(null); >> >>>> } >> >>>> config.update(pidConf); >> >>>> } >> >>>> >> >>>> So I believe that you are getting a null configuration because there >> is >> >> a short window between the setBundleLocation(null) (at this point, the >> >> configuration is null) and the config.update(pidConf) call ... >> >>>> >> >>>> So, the good news is that I'm not having anymore some NPE using your >> >> latest commits :-) and I think our application is now fully >> operational. >> >>>> >> >>>> but ... (please don't start to abominate me ) now, in order to do a >> >> final check, I restarted the integration tests and there is still two >> >> problems: >> >>>> >> >>>> 1) I'm sometimes getting some out of memory errors: this is probably >> >> caused by the ComponentConcurrencyTest/Felix3680Test tests, which are >> >> currently configured in DEBUG mode ? >> >>>> >> >>>> 2) I ran the tests two times, and the second time, I got this >> exception >> >> with the failing >> >>>> Felix3680_2Test: >> >>>> >> >>>> >> >> >> test_concurrent_injection_with_bundleContext(org.apache.felix.scr.integration.Felix3680_2Test) >> >> Time elapsed: 36.597 sec <<< ERROR! >> >>>> java.lang.NullPointerException >> >>>> at >> >> >> org.apache.felix.scr.impl.manager.DependencyManager.invokeUnbindMethod(DependencyManager.java:1710) >> >>>> at >> >> >> org.apache.felix.scr.impl.manager.SingleComponentManager.invokeUnbindMethod(SingleComponentManager.java:387) >> >>>> at >> >> >> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.removedService(DependencyManager.java:355) >> >>>> at >> >> >> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.removedService(DependencyManager.java:290) >> >>>> at >> >> >> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1503) >> >>>> at >> >> >> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1398) >> >>>> at >> >> >> org.apache.felix.scr.impl.manager.ServiceTracker$AbstractTracked.untrack(ServiceTracker.java:1258) >> >>>> at >> >> >> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1437) >> >>>> at >> >> >> org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:932) >> >>>> at >> >> >> org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:793) >> >>>> at >> >> >> org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:543) >> >>>> at >> >> org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4260) >> >>>> at org.apache.felix.framework.Felix.access$000(Felix.java:74) >> >>>> at >> >> org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:390) >> >>>> at >> >> >> org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:148) >> >>>> at >> >> >> org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:127) >> >>>> at >> >> >> org.apache.felix.scr.integration.components.felix3680_2.Main$RegistrationHelper$2.run(Main.java:136) >> >>>> at >> >> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> >>>> at >> >> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> >>>> at java.lang.Thread.run(Thread.java:722) >> >>>> >> >>>> Are you also getting this exception ? >> >>>> >> >>>> thanks >> >>>> >> >>>> /Pierre >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> On Sat, Oct 26, 2013 at 6:34 PM, David Jencks < >> [email protected]> >> >> wrote: >> >>>> Hi PIerre, >> >>>> >> >>>> Looking at the CA spec it looks like CA is supposed to send out >> >> CM_LOCATION_CHANGED events even before any properties are set when >> >> setBundleLocation is called. I added some code to ignore these events. >> >> Note that DS is "reserving" the configurations for (one of) the >> >> component(s) that will be consuming them by calling >> getConfiguration(pid). >> >>>> >> >>>> I do wonder how the location to something non-null on your >> >> configurations before the properties are set. >> >>>> >> >>>> Waiting for the next bug :-) >> >>>> >> >>>> thanks >> >>>> david jencks >> >>>> >> >>>> On Oct 26, 2013, at 3:00 AM, Pierre De Rop <[email protected]> >> >> wrote: >> >>>> >> >>>>> Hello David, >> >>>>> >> >>>>> The code we are using to configure our components is old, at at the >> >> time we wrote it, configadmin was not supporting multi-location. But I >> do >> >> agree, we can now use the "?" multi-location. >> >>>>> >> >>>>> Now, I'm sorry but I'm still seeing another NPE (sometimes, not >> >> always): >> >>>>> >> >>>>> 2013-10-26 11:45:44,209 CM Event Dispatcher (Fire >> ConfigurationEvent: >> >> pid=sipagent) ERROR osgi - [43] Unexpected problem delivering >> >> configuration event to [org.osgi.service.cm.ConfigurationListener, >> id=102, >> >> >> bundle=341/reference:file:/home/nxuser/pp/bundles/custo/org.apache.felix.scr.jar] >> >>>>> >> >>>>> java.lang.NullPointerException >> >>>>> at >> >> >> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.getProperties(ComponentFactoryImpl.java:226) >> >>>>> at >> >> >> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.configurationUpdated(ComponentFactoryImpl.java:396) >> >>>>> at >> >> >> org.apache.felix.scr.impl.config.ConfigurationSupport.configurationEvent(ConfigurationSupport.java:344) >> >>>>> at >> >> >> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2032) >> >>>>> at >> >> >> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2002) >> >>>>> at >> >> org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103) >> >>>>> at java.lang.Thread.run(Thread.java:722) >> >>>>> >> >>>>> >> >>>>> I'm not sure, but it seems that ConfigAdmin is providing a null >> >> dictionary, when delivering a CM_LOCATION_CHANGED event ? if correct, >> then >> >> Is this a normal behavior ? >> >>>>> >> >>>>> This is strange; perhaps I shall start a new integration test ? >> >>>>> >> >>>>> /Pierre >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> On Sat, Oct 26, 2013 at 9:54 AM, David Jencks < >> [email protected]> >> >> wrote: >> >>>>> Hi Pierre, >> >>>>> >> >>>>> This pointed out a logic error I introduced for Felix 3651. I >> opened >> >> https://issues.apache.org/jira/browse/FELIX-4293 and fixed the error I >> >> found which I think explains the NPE. Could you check this? >> >>>>> >> >>>>> Could I ask what you are trying to do by setting the bundleLocation >> to >> >> null? If you want to allow any bundle to receive the configuration you >> >> could use multi-location support and set the location to "?" With the >> code >> >> you have now, if the configuration is already in use by a DS >> component, the >> >> location changed event will result in the bundle location being reset >> back >> >> to what it was. >> >>>>> >> >>>>> thanks! >> >>>>> david jencks >> >>>>> On Oct 25, 2013, at 8:32 AM, Pierre De Rop <[email protected]> >> >> wrote: >> >>>>> >> >>>>>> Hi David, >> >>>>>> >> >>>>>> thanks; The fix is fixing the problem :-) >> >>>>>> >> >>>>>> but ... there's now a new different problem: i'm now sometimes >> >> getting this >> >>>>>> NPE, after SCR is receiving a CM_LOCATION_CHANGED event: >> >>>>>> >> >>>>>> 2013-10-25 16:11:44,674 CM Event Dispatcher (Fire >> ConfigurationEvent: >> >>>>>> pid=sipagent) ERROR osgi - [43] Unexpected problem delivering >> >>>>>> configuration event to [org.osgi.service.cm.ConfigurationListener, >> >> id=102, >> >>>>>> >> >> >> bundle=341/reference:file:/home/nxuser/pp/bundles/custo/org.apache.felix.scr.jar] >> >>>>>> >> >>>>>> java.lang.NullPointerException >> >>>>>> at >> >>>>>> >> >> >> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.getProperties(ComponentFactoryImpl.java:226) >> >>>>>> at >> >>>>>> >> >> >> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.configurationUpdated(ComponentFactoryImpl.java:396) >> >>>>>> at >> >>>>>> >> >> >> org.apache.felix.scr.impl.config.ConfigurationSupport.configurationEvent(ConfigurationSupport.java:390) >> >>>>>> at >> >>>>>> >> >> >> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2032) >> >>>>>> at >> >>>>>> >> >> >> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2002) >> >>>>>> at >> >> org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103) >> >>>>>> at java.lang.Thread.run(Thread.java:722) >> >>>>>> >> >>>>>> Perhaps a new jira issue shall be opened ? >> >>>>>> >> >>>>>> I think we are getting a CM_LOCATION_CHANGED event because in our >> >>>>>> application, we populate configuration admin by doing something >> like >> >> this: >> >>>>>> >> >>>>>> Configuration cfg = cm.getConfiguration(pid, null) >> >>>>>> if (config.getBundleLocation() != null) { >> >>>>>> config.setBundleLocation(null); >> >>>>>> } >> >>>>>> >> >>>>>> The setBundleLocation(null) is probably useless, but this leads to >> a >> >>>>>> CM_LOCATION_CHANGED event, which then sometimes ends up with the >> NPE. >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> On Friday, October 25, 2013, David Jencks <[email protected]> >> >> wrote: >> >>>>>>> Hi Pierre, >> >>>>>>> >> >>>>>>> You are so good at writing useful tests!! >> >>>>>>> >> >>>>>>> I found a place to call setTargets(getProperties()) from inside >> >>>>>> ComponentFactoryImpl that would have fewer side effects. Could you >> >> see if >> >>>>>> this makes your actual applications work properly? I'm uploading a >> >>>>>> snapshot. >> >>>>>>> >> >>>>>>> many thanks >> >>>>>>> david jencks >> >>>>>>> >> >>>>>>> On Oct 24, 2013, at 6:17 AM, Pierre De Rop < >> [email protected]> >> >> wrote: >> >>>>>>> >> >>>>>>>> Hi David, >> >>>>>>>> >> >>>>>>>> Since this application is complex, I'm not able to provide logs >> >> because >> >>>>>>>> there are hundreds of components involved which are not mine, and >> >> for >> >>>>>> now, >> >>>>>>>> I'm not able to diagnose the problem. >> >>>>>>>> >> >>>>>>>> But I have created FELIX-4290, and joined to it an integration >> test >> >> which >> >>>>>>>> seems to reproduce the kind of problem I think I'm having in my >> >>>>>>>> application. I also joined the proposed patch. >> >>>>>>>> >> >>>>>>>> I did not have time to test the patch you suggested regarding the >> >>>>>>>> SingleComponentManager.reconfigure method, so let's continue to >> >>>>>> investigate >> >>>>>>>> using the jira issue and the test I attached to it. >> >>>>>>>> >> >>>>>>>> Thanks; >> >>>>>>>> >> >>>>>>>> /Pierre >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> On Thu, Oct 24, 2013 at 12:27 AM, David Jencks < >> >> [email protected] >> >>>>>>> wrote: >> >>>>>>>> >> >>>>>>>>> Hi Pierre, >> >>>>>>>>> >> >>>>>>>>> I believe you that this code path doesn't work :-) >> >>>>>>>>> >> >>>>>>>>> I think there should be a less invasive way to fix this. By any >> >> chance >> >>>>>>>>> can you get a debug-enabled log from when this problem occurs? >> It >> >> would >> >>>>>>>>> help confirm my suspicions of what might be missing. >> >>>>>>>>> >> >>>>>>>>> FWIW I suspect SingleComponentManager.reconfigure is missing a >> >> check for >> >>>>>>>>> m_factoryProperties here (line 561): >> >>>>>>>>> >> >>>>>>>>> // nothing to do if there is no configuration (see >> >> FELIX-714) >> >>>>>>>>> if ( configuration == null && m_configurationProperties >> == >> >>>>>>>>> null ) >> >>>>>>>>> { >> >>>>>>>>> log( LogService.LOG_DEBUG, "No configuration >> provided >> >> (or >> >>>>>>>>> deleted), nothing to do", null ); >> >>>>>>>>> return; >> >>>>>>>>> } >> >>>>>>>>> >> >>>>>>>>> Unless we can't figure anything out for sure I'd prefer to fix >> this >> >>>>>> before >> >>>>>>>>> the release. >> >>>>>>>>> >> >>>>>>>>> thanks >> >>>>>>>>> david jencks >> >>>>>>>>> >> >>>>>>>>> On Oct 23, 2013, at 3:09 PM, Pierre De Rop < >> [email protected] >> >>> >> >>>>>> wrote: >> >>>>>>>>> >> >>>>>>>>>> Hi David, >> >>>>>>>>>> >> >>>>>>>>>> (sorry to do all this noise while you are releasing ...) >> >>>>>>>>>> >> >>>>>>>>>> We are indeed using factory components; and today, I finally >> >> found and >> >>>>>>>>>> fixed a cycle, using the Apache Service Diagnostic tool; and >> I'm >> >> going >> >>>>>>>>>> further on but now I'm facing another problem which I did not >> >> have in >> >>>>>> the >> >>>>>>>>>> scr 1.6.2. >> >>>>>>>>>> >> >>>>>>>>>> So, I would like to discuss about this new problem with you >> >> before you >> >>>>>>>>> redo >> >>>>>>>>>> a release, in order to decide if this problem (if there is >> really >> >> one >> >>>>>> ?) >> >>>>>>>>>> shall be addressed now or after the upcoming release ? >> >>>>>>>>>> >> >>>>>>>>>> So, in our application, we are extensively using factory >> >> components >> >>>>>>>>>> (@Component(factory=XXX")). >> >>>>>>>>>> When we instantiate a factory component (using >> >>>>>>>>>> ComponentFactory.newInstance()), We pass to the newInstance() >> >> method >> >>>>>> some >> >>>>>>>>>> additional component properties which may also contain some >> target >> >>>>>>>>> filters. >> >>>>>>>>>> >> >>>>>>>>>> This allows to dynamically configure the filter of some >> References >> >>>>>>>>> declared >> >>>>>>>>>> in the factory component. >> >>>>>>>>>> in the scr 1.6.2, this mechanism was working fine. But using >> >> trunk, >> >>>>>> this >> >>>>>>>>>> does not work all the time. Some target filters seem to be >> >> correctly >> >>>>>>>>>> configured, and some others are not (I'm not sure, actually, >> it's >> >> late >> >>>>>>>>> ...). >> >>>>>>>>>> >> >>>>>>>>>> So, it looks like sometimes, some target filters are not >> updated >> >> before >> >>>>>>>>>> activating components ? or factory components ? >> >>>>>>>>>> >> >>>>>>>>>> I'm not sure but this might be related to the old FELIX-3726. >> >>>>>>>>>> Now, interestingly, I did the following patch and my >> application >> >> is now >> >>>>>>>>>> working fine: In the AbstractComponentManager class, I >> >> systematically >> >>>>>>>>>> update target filters, like this: >> >>>>>>>>>> >> >>>>>>>>>> +++ >> >>>>>>>>>> >> >>>>>>>>> >> >>>>>> >> >> >> src/main/java/org/apache/felix/scr/impl/manager/AbstractComponentManager.java >> >>>>>>>>>> >> >>>>> >> >>>>> >> >>>> >> >>>> >> >>> >> >> >> >> >> >> >
