Pierre, I opened https://issues.apache.org/jira/browse/FELIX-4297 and fixed the problems I found (for 2). I don't see the OOM often enough to have any confidence that anything I do would actually fix it, so I'm inclined to do nothing. Is that OK with you?
Unless you can find some more problems :-) I'm planning to try another release when the config admin 1.8 gets to maven central. I'm going to update the pom to normally run against the CA 1.8 version supporting R5 and change the profile so running against R4 requires specifying profiles explicitly. thanks again! david jencks On Oct 28, 2013, at 12:24 AM, David Jencks <[email protected]> wrote: > Hi Pierre, > > Much better to find these problems before a release than just after! > > I saw an OOM once recently but haven't been able to reproduce it. > > I'm looking into the NPE. I think I see the timing hole it is using but need > to think about it some more. > > many thanks! > david jencks > > On Oct 27, 2013, at 2:58 AM, Pierre De Rop <[email protected]> wrote: > >> Hi David, >> >> Looking at our configurator component we are currently using (but we will >> fix it in order to use the multi-location "?"), I see this: >> >> void configure(String pid, Dictionary pidConf) { >> Configuration config = getConfiguration(_pid, null); >> if (config.getBundleLocation() != null) { >> config.setBundleLocation(null); >> } >> config.update(pidConf); >> } >> >> So I believe that you are getting a null configuration because there is a >> short window between the setBundleLocation(null) (at this point, the >> configuration is null) and the config.update(pidConf) call ... >> >> So, the good news is that I'm not having anymore some NPE using your latest >> commits :-) and I think our application is now fully operational. >> >> but ... (please don't start to abominate me ) now, in order to do a final >> check, I restarted the integration tests and there is still two problems: >> >> 1) I'm sometimes getting some out of memory errors: this is probably caused >> by the ComponentConcurrencyTest/Felix3680Test tests, which are currently >> configured in DEBUG mode ? >> >> 2) I ran the tests two times, and the second time, I got this exception with >> the failing >> Felix3680_2Test: >> >> test_concurrent_injection_with_bundleContext(org.apache.felix.scr.integration.Felix3680_2Test) >> Time elapsed: 36.597 sec <<< ERROR! >> java.lang.NullPointerException >> at >> org.apache.felix.scr.impl.manager.DependencyManager.invokeUnbindMethod(DependencyManager.java:1710) >> at >> org.apache.felix.scr.impl.manager.SingleComponentManager.invokeUnbindMethod(SingleComponentManager.java:387) >> at >> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.removedService(DependencyManager.java:355) >> at >> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.removedService(DependencyManager.java:290) >> at >> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1503) >> at >> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1398) >> at >> org.apache.felix.scr.impl.manager.ServiceTracker$AbstractTracked.untrack(ServiceTracker.java:1258) >> at >> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1437) >> at >> org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:932) >> at >> org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:793) >> at >> org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:543) >> at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4260) >> at org.apache.felix.framework.Felix.access$000(Felix.java:74) >> at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:390) >> at >> org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:148) >> at >> org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:127) >> at >> org.apache.felix.scr.integration.components.felix3680_2.Main$RegistrationHelper$2.run(Main.java:136) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:722) >> >> Are you also getting this exception ? >> >> thanks >> >> /Pierre >> >> >> >> >> >> >> >> On Sat, Oct 26, 2013 at 6:34 PM, David Jencks <[email protected]> wrote: >> Hi PIerre, >> >> Looking at the CA spec it looks like CA is supposed to send out >> CM_LOCATION_CHANGED events even before any properties are set when >> setBundleLocation is called. I added some code to ignore these events. >> Note that DS is "reserving" the configurations for (one of) the component(s) >> that will be consuming them by calling getConfiguration(pid). >> >> I do wonder how the location to something non-null on your configurations >> before the properties are set. >> >> Waiting for the next bug :-) >> >> thanks >> david jencks >> >> On Oct 26, 2013, at 3:00 AM, Pierre De Rop <[email protected]> wrote: >> >>> Hello David, >>> >>> The code we are using to configure our components is old, at at the time we >>> wrote it, configadmin was not supporting multi-location. But I do agree, we >>> can now use the "?" multi-location. >>> >>> Now, I'm sorry but I'm still seeing another NPE (sometimes, not always): >>> >>> 2013-10-26 11:45:44,209 CM Event Dispatcher (Fire ConfigurationEvent: >>> pid=sipagent) ERROR osgi - [43] Unexpected problem delivering >>> configuration event to [org.osgi.service.cm.ConfigurationListener, id=102, >>> bundle=341/reference:file:/home/nxuser/pp/bundles/custo/org.apache.felix.scr.jar] >>> >>> java.lang.NullPointerException >>> at >>> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.getProperties(ComponentFactoryImpl.java:226) >>> at >>> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.configurationUpdated(ComponentFactoryImpl.java:396) >>> at >>> org.apache.felix.scr.impl.config.ConfigurationSupport.configurationEvent(ConfigurationSupport.java:344) >>> at >>> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2032) >>> at >>> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2002) >>> at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103) >>> at java.lang.Thread.run(Thread.java:722) >>> >>> >>> I'm not sure, but it seems that ConfigAdmin is providing a null dictionary, >>> when delivering a CM_LOCATION_CHANGED event ? if correct, then Is this a >>> normal behavior ? >>> >>> This is strange; perhaps I shall start a new integration test ? >>> >>> /Pierre >>> >>> >>> >>> >>> On Sat, Oct 26, 2013 at 9:54 AM, David Jencks <[email protected]> >>> wrote: >>> Hi Pierre, >>> >>> This pointed out a logic error I introduced for Felix 3651. I opened >>> https://issues.apache.org/jira/browse/FELIX-4293 and fixed the error I >>> found which I think explains the NPE. Could you check this? >>> >>> Could I ask what you are trying to do by setting the bundleLocation to >>> null? If you want to allow any bundle to receive the configuration you >>> could use multi-location support and set the location to "?" With the code >>> you have now, if the configuration is already in use by a DS component, the >>> location changed event will result in the bundle location being reset back >>> to what it was. >>> >>> thanks! >>> david jencks >>> On Oct 25, 2013, at 8:32 AM, Pierre De Rop <[email protected]> wrote: >>> >>>> Hi David, >>>> >>>> thanks; The fix is fixing the problem :-) >>>> >>>> but ... there's now a new different problem: i'm now sometimes getting this >>>> NPE, after SCR is receiving a CM_LOCATION_CHANGED event: >>>> >>>> 2013-10-25 16:11:44,674 CM Event Dispatcher (Fire ConfigurationEvent: >>>> pid=sipagent) ERROR osgi - [43] Unexpected problem delivering >>>> configuration event to [org.osgi.service.cm.ConfigurationListener, id=102, >>>> bundle=341/reference:file:/home/nxuser/pp/bundles/custo/org.apache.felix.scr.jar] >>>> >>>> java.lang.NullPointerException >>>> at >>>> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.getProperties(ComponentFactoryImpl.java:226) >>>> at >>>> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.configurationUpdated(ComponentFactoryImpl.java:396) >>>> at >>>> org.apache.felix.scr.impl.config.ConfigurationSupport.configurationEvent(ConfigurationSupport.java:390) >>>> at >>>> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2032) >>>> at >>>> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2002) >>>> at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103) >>>> at java.lang.Thread.run(Thread.java:722) >>>> >>>> Perhaps a new jira issue shall be opened ? >>>> >>>> I think we are getting a CM_LOCATION_CHANGED event because in our >>>> application, we populate configuration admin by doing something like this: >>>> >>>> Configuration cfg = cm.getConfiguration(pid, null) >>>> if (config.getBundleLocation() != null) { >>>> config.setBundleLocation(null); >>>> } >>>> >>>> The setBundleLocation(null) is probably useless, but this leads to a >>>> CM_LOCATION_CHANGED event, which then sometimes ends up with the NPE. >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Friday, October 25, 2013, David Jencks <[email protected]> wrote: >>>>> Hi Pierre, >>>>> >>>>> You are so good at writing useful tests!! >>>>> >>>>> I found a place to call setTargets(getProperties()) from inside >>>> ComponentFactoryImpl that would have fewer side effects. Could you see if >>>> this makes your actual applications work properly? I'm uploading a >>>> snapshot. >>>>> >>>>> many thanks >>>>> david jencks >>>>> >>>>> On Oct 24, 2013, at 6:17 AM, Pierre De Rop <[email protected]> wrote: >>>>> >>>>>> Hi David, >>>>>> >>>>>> Since this application is complex, I'm not able to provide logs because >>>>>> there are hundreds of components involved which are not mine, and for >>>> now, >>>>>> I'm not able to diagnose the problem. >>>>>> >>>>>> But I have created FELIX-4290, and joined to it an integration test which >>>>>> seems to reproduce the kind of problem I think I'm having in my >>>>>> application. I also joined the proposed patch. >>>>>> >>>>>> I did not have time to test the patch you suggested regarding the >>>>>> SingleComponentManager.reconfigure method, so let's continue to >>>> investigate >>>>>> using the jira issue and the test I attached to it. >>>>>> >>>>>> Thanks; >>>>>> >>>>>> /Pierre >>>>>> >>>>>> >>>>>> On Thu, Oct 24, 2013 at 12:27 AM, David Jencks <[email protected] >>>>> wrote: >>>>>> >>>>>>> Hi Pierre, >>>>>>> >>>>>>> I believe you that this code path doesn't work :-) >>>>>>> >>>>>>> I think there should be a less invasive way to fix this. By any chance >>>>>>> can you get a debug-enabled log from when this problem occurs? It would >>>>>>> help confirm my suspicions of what might be missing. >>>>>>> >>>>>>> FWIW I suspect SingleComponentManager.reconfigure is missing a check for >>>>>>> m_factoryProperties here (line 561): >>>>>>> >>>>>>> // nothing to do if there is no configuration (see FELIX-714) >>>>>>> if ( configuration == null && m_configurationProperties == >>>>>>> null ) >>>>>>> { >>>>>>> log( LogService.LOG_DEBUG, "No configuration provided (or >>>>>>> deleted), nothing to do", null ); >>>>>>> return; >>>>>>> } >>>>>>> >>>>>>> Unless we can't figure anything out for sure I'd prefer to fix this >>>> before >>>>>>> the release. >>>>>>> >>>>>>> thanks >>>>>>> david jencks >>>>>>> >>>>>>> On Oct 23, 2013, at 3:09 PM, Pierre De Rop <[email protected]> >>>> wrote: >>>>>>> >>>>>>>> Hi David, >>>>>>>> >>>>>>>> (sorry to do all this noise while you are releasing ...) >>>>>>>> >>>>>>>> We are indeed using factory components; and today, I finally found and >>>>>>>> fixed a cycle, using the Apache Service Diagnostic tool; and I'm going >>>>>>>> further on but now I'm facing another problem which I did not have in >>>> the >>>>>>>> scr 1.6.2. >>>>>>>> >>>>>>>> So, I would like to discuss about this new problem with you before you >>>>>>> redo >>>>>>>> a release, in order to decide if this problem (if there is really one >>>> ?) >>>>>>>> shall be addressed now or after the upcoming release ? >>>>>>>> >>>>>>>> So, in our application, we are extensively using factory components >>>>>>>> (@Component(factory=XXX")). >>>>>>>> When we instantiate a factory component (using >>>>>>>> ComponentFactory.newInstance()), We pass to the newInstance() method >>>> some >>>>>>>> additional component properties which may also contain some target >>>>>>> filters. >>>>>>>> >>>>>>>> This allows to dynamically configure the filter of some References >>>>>>> declared >>>>>>>> in the factory component. >>>>>>>> in the scr 1.6.2, this mechanism was working fine. But using trunk, >>>> this >>>>>>>> does not work all the time. Some target filters seem to be correctly >>>>>>>> configured, and some others are not (I'm not sure, actually, it's late >>>>>>> ...). >>>>>>>> >>>>>>>> So, it looks like sometimes, some target filters are not updated before >>>>>>>> activating components ? or factory components ? >>>>>>>> >>>>>>>> I'm not sure but this might be related to the old FELIX-3726. >>>>>>>> Now, interestingly, I did the following patch and my application is now >>>>>>>> working fine: In the AbstractComponentManager class, I systematically >>>>>>>> update target filters, like this: >>>>>>>> >>>>>>>> +++ >>>>>>>> >>>>>>> >>>> src/main/java/org/apache/felix/scr/impl/manager/AbstractComponentManager.java >>>>>>>> >>> >>> >> >> >
