Hi Pierre, I think there should be a thread dump in the log to go with this message. If there is could you send it to me or attach it to FELIX-4297?
Does this happen consistently? many thanks david jencks On Oct 29, 2013, at 3:43 AM, Pierre De Rop <[email protected]> wrote: > Hi David; > > 1) Every integration tests are passing OK. > > Regarding the OOM, I believe that the out of memory is likely to be cause > by the following load tests, which are running in DEBUG mode: > > src/test/java/org/apache/felix/scr/integration/ComponentConcurrencyTest.java > src/test/java/org/apache/felix/scr/integration/Felix3680Test.java > src/test/java/org/apache/felix/scr/integration/Felix3680_2Test.java > > > So, I just added this in each files (in static initializers): > > DS_LOGLEVEL = "warn"; > > And I have no more OOM, with this nice test result message: > > Tests run: 103, Failures: 0, Errors: 0, Skipped: 0 > [INFO] BUILD SUCCESS > > > > 2) Now, I did again a non regression tests with our applications, and I'm > seeing a new WARN message (but it does not seem to prevent the application > to be correctly initialized: all components are in the end satisfied): > > (if you believe that it's now time to release, then you can go ahead, and > we'll then investigate this new issue after the release ... I let you > decide) > > -> > > 2013-10-29 11:10:02,487 CM Event Dispatcher (Fire ConfigurationEvent: > pid=com.alcatel.as.ioh.impl.server.ServerFactoryImpl) ERROR osgi - > [com.alcatel.as.ioh.impl.server.ServerFactoryImpl(91)] DependencyManager : > invokeBindMethod : timeout on open latch newtcpprocessor > > > > > kind regards; > /Pierre > > > On Tue, Oct 29, 2013 at 12:13 AM, David Jencks <[email protected]>wrote: > >> Pierre, >> >> I opened https://issues.apache.org/jira/browse/FELIX-4297 and fixed the >> problems I found (for 2). I don't see the OOM often enough to have any >> confidence that anything I do would actually fix it, so I'm inclined to do >> nothing. Is that OK with you? >> >> Unless you can find some more problems :-) I'm planning to try another >> release when the config admin 1.8 gets to maven central. I'm going to >> update the pom to normally run against the CA 1.8 version supporting R5 and >> change the profile so running against R4 requires specifying profiles >> explicitly. >> >> thanks again! >> david jencks >> >> On Oct 28, 2013, at 12:24 AM, David Jencks <[email protected]> wrote: >> >>> Hi Pierre, >>> >>> Much better to find these problems before a release than just after! >>> >>> I saw an OOM once recently but haven't been able to reproduce it. >>> >>> I'm looking into the NPE. I think I see the timing hole it is using but >> need to think about it some more. >>> >>> many thanks! >>> david jencks >>> >>> On Oct 27, 2013, at 2:58 AM, Pierre De Rop <[email protected]> >> wrote: >>> >>>> Hi David, >>>> >>>> Looking at our configurator component we are currently using (but we >> will fix it in order to use the multi-location "?"), I see this: >>>> >>>> void configure(String pid, Dictionary pidConf) { >>>> Configuration config = getConfiguration(_pid, null); >>>> if (config.getBundleLocation() != null) { >>>> config.setBundleLocation(null); >>>> } >>>> config.update(pidConf); >>>> } >>>> >>>> So I believe that you are getting a null configuration because there is >> a short window between the setBundleLocation(null) (at this point, the >> configuration is null) and the config.update(pidConf) call ... >>>> >>>> So, the good news is that I'm not having anymore some NPE using your >> latest commits :-) and I think our application is now fully operational. >>>> >>>> but ... (please don't start to abominate me ) now, in order to do a >> final check, I restarted the integration tests and there is still two >> problems: >>>> >>>> 1) I'm sometimes getting some out of memory errors: this is probably >> caused by the ComponentConcurrencyTest/Felix3680Test tests, which are >> currently configured in DEBUG mode ? >>>> >>>> 2) I ran the tests two times, and the second time, I got this exception >> with the failing >>>> Felix3680_2Test: >>>> >>>> >> test_concurrent_injection_with_bundleContext(org.apache.felix.scr.integration.Felix3680_2Test) >> Time elapsed: 36.597 sec <<< ERROR! >>>> java.lang.NullPointerException >>>> at >> org.apache.felix.scr.impl.manager.DependencyManager.invokeUnbindMethod(DependencyManager.java:1710) >>>> at >> org.apache.felix.scr.impl.manager.SingleComponentManager.invokeUnbindMethod(SingleComponentManager.java:387) >>>> at >> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.removedService(DependencyManager.java:355) >>>> at >> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.removedService(DependencyManager.java:290) >>>> at >> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1503) >>>> at >> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1398) >>>> at >> org.apache.felix.scr.impl.manager.ServiceTracker$AbstractTracked.untrack(ServiceTracker.java:1258) >>>> at >> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1437) >>>> at >> org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:932) >>>> at >> org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:793) >>>> at >> org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:543) >>>> at >> org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4260) >>>> at org.apache.felix.framework.Felix.access$000(Felix.java:74) >>>> at >> org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:390) >>>> at >> org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:148) >>>> at >> org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:127) >>>> at >> org.apache.felix.scr.integration.components.felix3680_2.Main$RegistrationHelper$2.run(Main.java:136) >>>> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>> at java.lang.Thread.run(Thread.java:722) >>>> >>>> Are you also getting this exception ? >>>> >>>> thanks >>>> >>>> /Pierre >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Sat, Oct 26, 2013 at 6:34 PM, David Jencks <[email protected]> >> wrote: >>>> Hi PIerre, >>>> >>>> Looking at the CA spec it looks like CA is supposed to send out >> CM_LOCATION_CHANGED events even before any properties are set when >> setBundleLocation is called. I added some code to ignore these events. >> Note that DS is "reserving" the configurations for (one of) the >> component(s) that will be consuming them by calling getConfiguration(pid). >>>> >>>> I do wonder how the location to something non-null on your >> configurations before the properties are set. >>>> >>>> Waiting for the next bug :-) >>>> >>>> thanks >>>> david jencks >>>> >>>> On Oct 26, 2013, at 3:00 AM, Pierre De Rop <[email protected]> >> wrote: >>>> >>>>> Hello David, >>>>> >>>>> The code we are using to configure our components is old, at at the >> time we wrote it, configadmin was not supporting multi-location. But I do >> agree, we can now use the "?" multi-location. >>>>> >>>>> Now, I'm sorry but I'm still seeing another NPE (sometimes, not >> always): >>>>> >>>>> 2013-10-26 11:45:44,209 CM Event Dispatcher (Fire ConfigurationEvent: >> pid=sipagent) ERROR osgi - [43] Unexpected problem delivering >> configuration event to [org.osgi.service.cm.ConfigurationListener, id=102, >> bundle=341/reference:file:/home/nxuser/pp/bundles/custo/org.apache.felix.scr.jar] >>>>> >>>>> java.lang.NullPointerException >>>>> at >> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.getProperties(ComponentFactoryImpl.java:226) >>>>> at >> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.configurationUpdated(ComponentFactoryImpl.java:396) >>>>> at >> org.apache.felix.scr.impl.config.ConfigurationSupport.configurationEvent(ConfigurationSupport.java:344) >>>>> at >> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2032) >>>>> at >> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2002) >>>>> at >> org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103) >>>>> at java.lang.Thread.run(Thread.java:722) >>>>> >>>>> >>>>> I'm not sure, but it seems that ConfigAdmin is providing a null >> dictionary, when delivering a CM_LOCATION_CHANGED event ? if correct, then >> Is this a normal behavior ? >>>>> >>>>> This is strange; perhaps I shall start a new integration test ? >>>>> >>>>> /Pierre >>>>> >>>>> >>>>> >>>>> >>>>> On Sat, Oct 26, 2013 at 9:54 AM, David Jencks <[email protected]> >> wrote: >>>>> Hi Pierre, >>>>> >>>>> This pointed out a logic error I introduced for Felix 3651. I opened >> https://issues.apache.org/jira/browse/FELIX-4293 and fixed the error I >> found which I think explains the NPE. Could you check this? >>>>> >>>>> Could I ask what you are trying to do by setting the bundleLocation to >> null? If you want to allow any bundle to receive the configuration you >> could use multi-location support and set the location to "?" With the code >> you have now, if the configuration is already in use by a DS component, the >> location changed event will result in the bundle location being reset back >> to what it was. >>>>> >>>>> thanks! >>>>> david jencks >>>>> On Oct 25, 2013, at 8:32 AM, Pierre De Rop <[email protected]> >> wrote: >>>>> >>>>>> Hi David, >>>>>> >>>>>> thanks; The fix is fixing the problem :-) >>>>>> >>>>>> but ... there's now a new different problem: i'm now sometimes >> getting this >>>>>> NPE, after SCR is receiving a CM_LOCATION_CHANGED event: >>>>>> >>>>>> 2013-10-25 16:11:44,674 CM Event Dispatcher (Fire ConfigurationEvent: >>>>>> pid=sipagent) ERROR osgi - [43] Unexpected problem delivering >>>>>> configuration event to [org.osgi.service.cm.ConfigurationListener, >> id=102, >>>>>> >> bundle=341/reference:file:/home/nxuser/pp/bundles/custo/org.apache.felix.scr.jar] >>>>>> >>>>>> java.lang.NullPointerException >>>>>> at >>>>>> >> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.getProperties(ComponentFactoryImpl.java:226) >>>>>> at >>>>>> >> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.configurationUpdated(ComponentFactoryImpl.java:396) >>>>>> at >>>>>> >> org.apache.felix.scr.impl.config.ConfigurationSupport.configurationEvent(ConfigurationSupport.java:390) >>>>>> at >>>>>> >> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2032) >>>>>> at >>>>>> >> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2002) >>>>>> at >> org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103) >>>>>> at java.lang.Thread.run(Thread.java:722) >>>>>> >>>>>> Perhaps a new jira issue shall be opened ? >>>>>> >>>>>> I think we are getting a CM_LOCATION_CHANGED event because in our >>>>>> application, we populate configuration admin by doing something like >> this: >>>>>> >>>>>> Configuration cfg = cm.getConfiguration(pid, null) >>>>>> if (config.getBundleLocation() != null) { >>>>>> config.setBundleLocation(null); >>>>>> } >>>>>> >>>>>> The setBundleLocation(null) is probably useless, but this leads to a >>>>>> CM_LOCATION_CHANGED event, which then sometimes ends up with the NPE. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Friday, October 25, 2013, David Jencks <[email protected]> >> wrote: >>>>>>> Hi Pierre, >>>>>>> >>>>>>> You are so good at writing useful tests!! >>>>>>> >>>>>>> I found a place to call setTargets(getProperties()) from inside >>>>>> ComponentFactoryImpl that would have fewer side effects. Could you >> see if >>>>>> this makes your actual applications work properly? I'm uploading a >>>>>> snapshot. >>>>>>> >>>>>>> many thanks >>>>>>> david jencks >>>>>>> >>>>>>> On Oct 24, 2013, at 6:17 AM, Pierre De Rop <[email protected]> >> wrote: >>>>>>> >>>>>>>> Hi David, >>>>>>>> >>>>>>>> Since this application is complex, I'm not able to provide logs >> because >>>>>>>> there are hundreds of components involved which are not mine, and >> for >>>>>> now, >>>>>>>> I'm not able to diagnose the problem. >>>>>>>> >>>>>>>> But I have created FELIX-4290, and joined to it an integration test >> which >>>>>>>> seems to reproduce the kind of problem I think I'm having in my >>>>>>>> application. I also joined the proposed patch. >>>>>>>> >>>>>>>> I did not have time to test the patch you suggested regarding the >>>>>>>> SingleComponentManager.reconfigure method, so let's continue to >>>>>> investigate >>>>>>>> using the jira issue and the test I attached to it. >>>>>>>> >>>>>>>> Thanks; >>>>>>>> >>>>>>>> /Pierre >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Oct 24, 2013 at 12:27 AM, David Jencks < >> [email protected] >>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Pierre, >>>>>>>>> >>>>>>>>> I believe you that this code path doesn't work :-) >>>>>>>>> >>>>>>>>> I think there should be a less invasive way to fix this. By any >> chance >>>>>>>>> can you get a debug-enabled log from when this problem occurs? It >> would >>>>>>>>> help confirm my suspicions of what might be missing. >>>>>>>>> >>>>>>>>> FWIW I suspect SingleComponentManager.reconfigure is missing a >> check for >>>>>>>>> m_factoryProperties here (line 561): >>>>>>>>> >>>>>>>>> // nothing to do if there is no configuration (see >> FELIX-714) >>>>>>>>> if ( configuration == null && m_configurationProperties == >>>>>>>>> null ) >>>>>>>>> { >>>>>>>>> log( LogService.LOG_DEBUG, "No configuration provided >> (or >>>>>>>>> deleted), nothing to do", null ); >>>>>>>>> return; >>>>>>>>> } >>>>>>>>> >>>>>>>>> Unless we can't figure anything out for sure I'd prefer to fix this >>>>>> before >>>>>>>>> the release. >>>>>>>>> >>>>>>>>> thanks >>>>>>>>> david jencks >>>>>>>>> >>>>>>>>> On Oct 23, 2013, at 3:09 PM, Pierre De Rop <[email protected] >>> >>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi David, >>>>>>>>>> >>>>>>>>>> (sorry to do all this noise while you are releasing ...) >>>>>>>>>> >>>>>>>>>> We are indeed using factory components; and today, I finally >> found and >>>>>>>>>> fixed a cycle, using the Apache Service Diagnostic tool; and I'm >> going >>>>>>>>>> further on but now I'm facing another problem which I did not >> have in >>>>>> the >>>>>>>>>> scr 1.6.2. >>>>>>>>>> >>>>>>>>>> So, I would like to discuss about this new problem with you >> before you >>>>>>>>> redo >>>>>>>>>> a release, in order to decide if this problem (if there is really >> one >>>>>> ?) >>>>>>>>>> shall be addressed now or after the upcoming release ? >>>>>>>>>> >>>>>>>>>> So, in our application, we are extensively using factory >> components >>>>>>>>>> (@Component(factory=XXX")). >>>>>>>>>> When we instantiate a factory component (using >>>>>>>>>> ComponentFactory.newInstance()), We pass to the newInstance() >> method >>>>>> some >>>>>>>>>> additional component properties which may also contain some target >>>>>>>>> filters. >>>>>>>>>> >>>>>>>>>> This allows to dynamically configure the filter of some References >>>>>>>>> declared >>>>>>>>>> in the factory component. >>>>>>>>>> in the scr 1.6.2, this mechanism was working fine. But using >> trunk, >>>>>> this >>>>>>>>>> does not work all the time. Some target filters seem to be >> correctly >>>>>>>>>> configured, and some others are not (I'm not sure, actually, it's >> late >>>>>>>>> ...). >>>>>>>>>> >>>>>>>>>> So, it looks like sometimes, some target filters are not updated >> before >>>>>>>>>> activating components ? or factory components ? >>>>>>>>>> >>>>>>>>>> I'm not sure but this might be related to the old FELIX-3726. >>>>>>>>>> Now, interestingly, I did the following patch and my application >> is now >>>>>>>>>> working fine: In the AbstractComponentManager class, I >> systematically >>>>>>>>>> update target filters, like this: >>>>>>>>>> >>>>>>>>>> +++ >>>>>>>>>> >>>>>>>>> >>>>>> >> src/main/java/org/apache/felix/scr/impl/manager/AbstractComponentManager.java >>>>>>>>>> >>>>> >>>>> >>>> >>>> >>> >> >>
