Hi David, no, I did not notice the ERROR message you are mentioning. (see the full logs in stacktrace2.log which I juste attached in the issue).
Please forget my comment about the thread dump in WARN. I did not realize that it was intentional to log it in DEBUG. regards; /Pierre On Tue, Oct 29, 2013 at 6:40 PM, David Jencks <[email protected]>wrote: > HI Pierre, > > Do you by any chance see a log message at ERROR level like > > Cannot create component instance due to failure to bind reference {0} > > shortly before the timeout message? > > I'm not sure if it's a good idea to dump the threads at WARN. If you > think it is I'll change it. Some people get upset if the logs are too long > :-) > > thanks > david jencks > > > On Oct 29, 2013, at 8:46 AM, Pierre De Rop <[email protected]> wrote: > > > yes David, consistently. but I don't see stacktraces. > > > > Now, looking at the code, I see that this stacktrace is logged in DEBUG: > > > > final void dumpThreads() > > { > > try > > { > > String dump = new ThreadDump().call(); > > log( LogService.LOG_DEBUG, dump, null ); > > > > So, I will activate DEBUG and provide the stacktrace. > > > > But is this a bug to log this stacktrace in DEBUG ? > > I mean: should it be logged in WARN, instead of DEBUG ? > > > > /pierre > > > > > > On Tue, Oct 29, 2013 at 4:28 PM, David Jencks <[email protected] > >wrote: > > > >> Hi Pierre, > >> > >> I think there should be a thread dump in the log to go with this > message. > >> If there is could you send it to me or attach it to FELIX-4297? > >> > >> Does this happen consistently? > >> > >> many thanks > >> david jencks > >> > >> On Oct 29, 2013, at 3:43 AM, Pierre De Rop <[email protected]> > wrote: > >> > >>> Hi David; > >>> > >>> 1) Every integration tests are passing OK. > >>> > >>> Regarding the OOM, I believe that the out of memory is likely to be > cause > >>> by the following load tests, which are running in DEBUG mode: > >>> > >>> > >> > src/test/java/org/apache/felix/scr/integration/ComponentConcurrencyTest.java > >>> src/test/java/org/apache/felix/scr/integration/Felix3680Test.java > >>> src/test/java/org/apache/felix/scr/integration/Felix3680_2Test.java > >>> > >>> > >>> So, I just added this in each files (in static initializers): > >>> > >>> DS_LOGLEVEL = "warn"; > >>> > >>> And I have no more OOM, with this nice test result message: > >>> > >>> Tests run: 103, Failures: 0, Errors: 0, Skipped: 0 > >>> [INFO] BUILD SUCCESS > >>> > >>> > >>> > >>> 2) Now, I did again a non regression tests with our applications, and > I'm > >>> seeing a new WARN message (but it does not seem to prevent the > >> application > >>> to be correctly initialized: all components are in the end satisfied): > >>> > >>> (if you believe that it's now time to release, then you can go ahead, > and > >>> we'll then investigate this new issue after the release ... I let you > >>> decide) > >>> > >>> -> > >>> > >>> 2013-10-29 11:10:02,487 CM Event Dispatcher (Fire ConfigurationEvent: > >>> pid=com.alcatel.as.ioh.impl.server.ServerFactoryImpl) ERROR osgi - > >>> [com.alcatel.as.ioh.impl.server.ServerFactoryImpl(91)] > DependencyManager > >> : > >>> invokeBindMethod : timeout on open latch newtcpprocessor > >>> > >>> > >>> > >>> > >>> kind regards; > >>> /Pierre > >>> > >>> > >>> On Tue, Oct 29, 2013 at 12:13 AM, David Jencks <[email protected] > >>> wrote: > >>> > >>>> Pierre, > >>>> > >>>> I opened https://issues.apache.org/jira/browse/FELIX-4297 and fixed > the > >>>> problems I found (for 2). I don't see the OOM often enough to have > any > >>>> confidence that anything I do would actually fix it, so I'm inclined > to > >> do > >>>> nothing. Is that OK with you? > >>>> > >>>> Unless you can find some more problems :-) I'm planning to try > another > >>>> release when the config admin 1.8 gets to maven central. I'm going to > >>>> update the pom to normally run against the CA 1.8 version supporting > R5 > >> and > >>>> change the profile so running against R4 requires specifying profiles > >>>> explicitly. > >>>> > >>>> thanks again! > >>>> david jencks > >>>> > >>>> On Oct 28, 2013, at 12:24 AM, David Jencks <[email protected]> > >> wrote: > >>>> > >>>>> Hi Pierre, > >>>>> > >>>>> Much better to find these problems before a release than just after! > >>>>> > >>>>> I saw an OOM once recently but haven't been able to reproduce it. > >>>>> > >>>>> I'm looking into the NPE. I think I see the timing hole it is using > >> but > >>>> need to think about it some more. > >>>>> > >>>>> many thanks! > >>>>> david jencks > >>>>> > >>>>> On Oct 27, 2013, at 2:58 AM, Pierre De Rop <[email protected]> > >>>> wrote: > >>>>> > >>>>>> Hi David, > >>>>>> > >>>>>> Looking at our configurator component we are currently using (but we > >>>> will fix it in order to use the multi-location "?"), I see this: > >>>>>> > >>>>>> void configure(String pid, Dictionary pidConf) { > >>>>>> Configuration config = getConfiguration(_pid, null); > >>>>>> if (config.getBundleLocation() != null) { > >>>>>> config.setBundleLocation(null); > >>>>>> } > >>>>>> config.update(pidConf); > >>>>>> } > >>>>>> > >>>>>> So I believe that you are getting a null configuration because there > >> is > >>>> a short window between the setBundleLocation(null) (at this point, the > >>>> configuration is null) and the config.update(pidConf) call ... > >>>>>> > >>>>>> So, the good news is that I'm not having anymore some NPE using your > >>>> latest commits :-) and I think our application is now fully > operational. > >>>>>> > >>>>>> but ... (please don't start to abominate me ) now, in order to do a > >>>> final check, I restarted the integration tests and there is still two > >>>> problems: > >>>>>> > >>>>>> 1) I'm sometimes getting some out of memory errors: this is probably > >>>> caused by the ComponentConcurrencyTest/Felix3680Test tests, which are > >>>> currently configured in DEBUG mode ? > >>>>>> > >>>>>> 2) I ran the tests two times, and the second time, I got this > >> exception > >>>> with the failing > >>>>>> Felix3680_2Test: > >>>>>> > >>>>>> > >>>> > >> > test_concurrent_injection_with_bundleContext(org.apache.felix.scr.integration.Felix3680_2Test) > >>>> Time elapsed: 36.597 sec <<< ERROR! > >>>>>> java.lang.NullPointerException > >>>>>> at > >>>> > >> > org.apache.felix.scr.impl.manager.DependencyManager.invokeUnbindMethod(DependencyManager.java:1710) > >>>>>> at > >>>> > >> > org.apache.felix.scr.impl.manager.SingleComponentManager.invokeUnbindMethod(SingleComponentManager.java:387) > >>>>>> at > >>>> > >> > org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.removedService(DependencyManager.java:355) > >>>>>> at > >>>> > >> > org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.removedService(DependencyManager.java:290) > >>>>>> at > >>>> > >> > org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1503) > >>>>>> at > >>>> > >> > org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1398) > >>>>>> at > >>>> > >> > org.apache.felix.scr.impl.manager.ServiceTracker$AbstractTracked.untrack(ServiceTracker.java:1258) > >>>>>> at > >>>> > >> > org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1437) > >>>>>> at > >>>> > >> > org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:932) > >>>>>> at > >>>> > >> > org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:793) > >>>>>> at > >>>> > >> > org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:543) > >>>>>> at > >>>> org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4260) > >>>>>> at org.apache.felix.framework.Felix.access$000(Felix.java:74) > >>>>>> at > >>>> org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:390) > >>>>>> at > >>>> > >> > org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:148) > >>>>>> at > >>>> > >> > org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:127) > >>>>>> at > >>>> > >> > org.apache.felix.scr.integration.components.felix3680_2.Main$RegistrationHelper$2.run(Main.java:136) > >>>>>> at > >>>> > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > >>>>>> at > >>>> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >>>>>> at java.lang.Thread.run(Thread.java:722) > >>>>>> > >>>>>> Are you also getting this exception ? > >>>>>> > >>>>>> thanks > >>>>>> > >>>>>> /Pierre > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Sat, Oct 26, 2013 at 6:34 PM, David Jencks < > [email protected] > >>> > >>>> wrote: > >>>>>> Hi PIerre, > >>>>>> > >>>>>> Looking at the CA spec it looks like CA is supposed to send out > >>>> CM_LOCATION_CHANGED events even before any properties are set when > >>>> setBundleLocation is called. I added some code to ignore these > events. > >>>> Note that DS is "reserving" the configurations for (one of) the > >>>> component(s) that will be consuming them by calling > >> getConfiguration(pid). > >>>>>> > >>>>>> I do wonder how the location to something non-null on your > >>>> configurations before the properties are set. > >>>>>> > >>>>>> Waiting for the next bug :-) > >>>>>> > >>>>>> thanks > >>>>>> david jencks > >>>>>> > >>>>>> On Oct 26, 2013, at 3:00 AM, Pierre De Rop <[email protected]> > >>>> wrote: > >>>>>> > >>>>>>> Hello David, > >>>>>>> > >>>>>>> The code we are using to configure our components is old, at at the > >>>> time we wrote it, configadmin was not supporting multi-location. But I > >> do > >>>> agree, we can now use the "?" multi-location. > >>>>>>> > >>>>>>> Now, I'm sorry but I'm still seeing another NPE (sometimes, not > >>>> always): > >>>>>>> > >>>>>>> 2013-10-26 11:45:44,209 CM Event Dispatcher (Fire > ConfigurationEvent: > >>>> pid=sipagent) ERROR osgi - [43] Unexpected problem delivering > >>>> configuration event to [org.osgi.service.cm.ConfigurationListener, > >> id=102, > >>>> > >> > bundle=341/reference:file:/home/nxuser/pp/bundles/custo/org.apache.felix.scr.jar] > >>>>>>> > >>>>>>> java.lang.NullPointerException > >>>>>>> at > >>>> > >> > org.apache.felix.scr.impl.manager.ComponentFactoryImpl.getProperties(ComponentFactoryImpl.java:226) > >>>>>>> at > >>>> > >> > org.apache.felix.scr.impl.manager.ComponentFactoryImpl.configurationUpdated(ComponentFactoryImpl.java:396) > >>>>>>> at > >>>> > >> > org.apache.felix.scr.impl.config.ConfigurationSupport.configurationEvent(ConfigurationSupport.java:344) > >>>>>>> at > >>>> > >> > org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2032) > >>>>>>> at > >>>> > >> > org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2002) > >>>>>>> at > >>>> org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103) > >>>>>>> at java.lang.Thread.run(Thread.java:722) > >>>>>>> > >>>>>>> > >>>>>>> I'm not sure, but it seems that ConfigAdmin is providing a null > >>>> dictionary, when delivering a CM_LOCATION_CHANGED event ? if correct, > >> then > >>>> Is this a normal behavior ? > >>>>>>> > >>>>>>> This is strange; perhaps I shall start a new integration test ? > >>>>>>> > >>>>>>> /Pierre > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Sat, Oct 26, 2013 at 9:54 AM, David Jencks < > >> [email protected]> > >>>> wrote: > >>>>>>> Hi Pierre, > >>>>>>> > >>>>>>> This pointed out a logic error I introduced for Felix 3651. I > opened > >>>> https://issues.apache.org/jira/browse/FELIX-4293 and fixed the error > I > >>>> found which I think explains the NPE. Could you check this? > >>>>>>> > >>>>>>> Could I ask what you are trying to do by setting the bundleLocation > >> to > >>>> null? If you want to allow any bundle to receive the configuration > you > >>>> could use multi-location support and set the location to "?" With the > >> code > >>>> you have now, if the configuration is already in use by a DS > component, > >> the > >>>> location changed event will result in the bundle location being reset > >> back > >>>> to what it was. > >>>>>>> > >>>>>>> thanks! > >>>>>>> david jencks > >>>>>>> On Oct 25, 2013, at 8:32 AM, Pierre De Rop <[email protected] > > > >>>> wrote: > >>>>>>> > >>>>>>>> Hi David, > >>>>>>>> > >>>>>>>> thanks; The fix is fixing the problem :-) > >>>>>>>> > >>>>>>>> but ... there's now a new different problem: i'm now sometimes > >>>> getting this > >>>>>>>> NPE, after SCR is receiving a CM_LOCATION_CHANGED event: > >>>>>>>> > >>>>>>>> 2013-10-25 16:11:44,674 CM Event Dispatcher (Fire > >> ConfigurationEvent: > >>>>>>>> pid=sipagent) ERROR osgi - [43] Unexpected problem delivering > >>>>>>>> configuration event to [org.osgi.service.cm.ConfigurationListener, > >>>> id=102, > >>>>>>>> > >>>> > >> > bundle=341/reference:file:/home/nxuser/pp/bundles/custo/org.apache.felix.scr.jar] > >>>>>>>> > >>>>>>>> java.lang.NullPointerException > >>>>>>>> at > >>>>>>>> > >>>> > >> > org.apache.felix.scr.impl.manager.ComponentFactoryImpl.getProperties(ComponentFactoryImpl.java:226) > >>>>>>>> at > >>>>>>>> > >>>> > >> > org.apache.felix.scr.impl.manager.ComponentFactoryImpl.configurationUpdated(ComponentFactoryImpl.java:396) > >>>>>>>> at > >>>>>>>> > >>>> > >> > org.apache.felix.scr.impl.config.ConfigurationSupport.configurationEvent(ConfigurationSupport.java:390) > >>>>>>>> at > >>>>>>>> > >>>> > >> > org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2032) > >>>>>>>> at > >>>>>>>> > >>>> > >> > org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2002) > >>>>>>>> at > >>>> org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103) > >>>>>>>> at java.lang.Thread.run(Thread.java:722) > >>>>>>>> > >>>>>>>> Perhaps a new jira issue shall be opened ? > >>>>>>>> > >>>>>>>> I think we are getting a CM_LOCATION_CHANGED event because in our > >>>>>>>> application, we populate configuration admin by doing something > like > >>>> this: > >>>>>>>> > >>>>>>>> Configuration cfg = cm.getConfiguration(pid, null) > >>>>>>>> if (config.getBundleLocation() != null) { > >>>>>>>> config.setBundleLocation(null); > >>>>>>>> } > >>>>>>>> > >>>>>>>> The setBundleLocation(null) is probably useless, but this leads > to a > >>>>>>>> CM_LOCATION_CHANGED event, which then sometimes ends up with the > >> NPE. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Friday, October 25, 2013, David Jencks <[email protected] > > > >>>> wrote: > >>>>>>>>> Hi Pierre, > >>>>>>>>> > >>>>>>>>> You are so good at writing useful tests!! > >>>>>>>>> > >>>>>>>>> I found a place to call setTargets(getProperties()) from inside > >>>>>>>> ComponentFactoryImpl that would have fewer side effects. Could > you > >>>> see if > >>>>>>>> this makes your actual applications work properly? I'm uploading > a > >>>>>>>> snapshot. > >>>>>>>>> > >>>>>>>>> many thanks > >>>>>>>>> david jencks > >>>>>>>>> > >>>>>>>>> On Oct 24, 2013, at 6:17 AM, Pierre De Rop < > [email protected] > >>> > >>>> wrote: > >>>>>>>>> > >>>>>>>>>> Hi David, > >>>>>>>>>> > >>>>>>>>>> Since this application is complex, I'm not able to provide logs > >>>> because > >>>>>>>>>> there are hundreds of components involved which are not mine, > and > >>>> for > >>>>>>>> now, > >>>>>>>>>> I'm not able to diagnose the problem. > >>>>>>>>>> > >>>>>>>>>> But I have created FELIX-4290, and joined to it an integration > >> test > >>>> which > >>>>>>>>>> seems to reproduce the kind of problem I think I'm having in my > >>>>>>>>>> application. I also joined the proposed patch. > >>>>>>>>>> > >>>>>>>>>> I did not have time to test the patch you suggested regarding > the > >>>>>>>>>> SingleComponentManager.reconfigure method, so let's continue to > >>>>>>>> investigate > >>>>>>>>>> using the jira issue and the test I attached to it. > >>>>>>>>>> > >>>>>>>>>> Thanks; > >>>>>>>>>> > >>>>>>>>>> /Pierre > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Thu, Oct 24, 2013 at 12:27 AM, David Jencks < > >>>> [email protected] > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Hi Pierre, > >>>>>>>>>>> > >>>>>>>>>>> I believe you that this code path doesn't work :-) > >>>>>>>>>>> > >>>>>>>>>>> I think there should be a less invasive way to fix this. By > any > >>>> chance > >>>>>>>>>>> can you get a debug-enabled log from when this problem occurs? > >> It > >>>> would > >>>>>>>>>>> help confirm my suspicions of what might be missing. > >>>>>>>>>>> > >>>>>>>>>>> FWIW I suspect SingleComponentManager.reconfigure is missing a > >>>> check for > >>>>>>>>>>> m_factoryProperties here (line 561): > >>>>>>>>>>> > >>>>>>>>>>> // nothing to do if there is no configuration (see > >>>> FELIX-714) > >>>>>>>>>>> if ( configuration == null && m_configurationProperties > >> == > >>>>>>>>>>> null ) > >>>>>>>>>>> { > >>>>>>>>>>> log( LogService.LOG_DEBUG, "No configuration > provided > >>>> (or > >>>>>>>>>>> deleted), nothing to do", null ); > >>>>>>>>>>> return; > >>>>>>>>>>> } > >>>>>>>>>>> > >>>>>>>>>>> Unless we can't figure anything out for sure I'd prefer to fix > >> this > >>>>>>>> before > >>>>>>>>>>> the release. > >>>>>>>>>>> > >>>>>>>>>>> thanks > >>>>>>>>>>> david jencks > >>>>>>>>>>> > >>>>>>>>>>> On Oct 23, 2013, at 3:09 PM, Pierre De Rop < > >> [email protected] > >>>>> > >>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hi David, > >>>>>>>>>>>> > >>>>>>>>>>>> (sorry to do all this noise while you are releasing ...) > >>>>>>>>>>>> > >>>>>>>>>>>> We are indeed using factory components; and today, I finally > >>>> found and > >>>>>>>>>>>> fixed a cycle, using the Apache Service Diagnostic tool; and > I'm > >>>> going > >>>>>>>>>>>> further on but now I'm facing another problem which I did not > >>>> have in > >>>>>>>> the > >>>>>>>>>>>> scr 1.6.2. > >>>>>>>>>>>> > >>>>>>>>>>>> So, I would like to discuss about this new problem with you > >>>> before you > >>>>>>>>>>> redo > >>>>>>>>>>>> a release, in order to decide if this problem (if there is > >> really > >>>> one > >>>>>>>> ?) > >>>>>>>>>>>> shall be addressed now or after the upcoming release ? > >>>>>>>>>>>> > >>>>>>>>>>>> So, in our application, we are extensively using factory > >>>> components > >>>>>>>>>>>> (@Component(factory=XXX")). > >>>>>>>>>>>> When we instantiate a factory component (using > >>>>>>>>>>>> ComponentFactory.newInstance()), We pass to the newInstance() > >>>> method > >>>>>>>> some > >>>>>>>>>>>> additional component properties which may also contain some > >> target > >>>>>>>>>>> filters. > >>>>>>>>>>>> > >>>>>>>>>>>> This allows to dynamically configure the filter of some > >> References > >>>>>>>>>>> declared > >>>>>>>>>>>> in the factory component. > >>>>>>>>>>>> in the scr 1.6.2, this mechanism was working fine. But using > >>>> trunk, > >>>>>>>> this > >>>>>>>>>>>> does not work all the time. Some target filters seem to be > >>>> correctly > >>>>>>>>>>>> configured, and some others are not (I'm not sure, actually, > >> it's > >>>> late > >>>>>>>>>>> ...). > >>>>>>>>>>>> > >>>>>>>>>>>> So, it looks like sometimes, some target filters are not > updated > >>>> before > >>>>>>>>>>>> activating components ? or factory components ? > >>>>>>>>>>>> > >>>>>>>>>>>> I'm not sure but this might be related to the old FELIX-3726. > >>>>>>>>>>>> Now, interestingly, I did the following patch and my > application > >>>> is now > >>>>>>>>>>>> working fine: In the AbstractComponentManager class, I > >>>> systematically > >>>>>>>>>>>> update target filters, like this: > >>>>>>>>>>>> > >>>>>>>>>>>> +++ > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >>>> > >> > src/main/java/org/apache/felix/scr/impl/manager/AbstractComponentManager.java > >>>>>>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > >>>> > >> > >> > >
