HI Pierre,
Do you by any chance see a log message at ERROR level like
Cannot create component instance due to failure to bind reference {0}
shortly before the timeout message?
I'm not sure if it's a good idea to dump the threads at WARN. If you think it
is I'll change it. Some people get upset if the logs are too long :-)
thanks
david jencks
On Oct 29, 2013, at 8:46 AM, Pierre De Rop <[email protected]> wrote:
> yes David, consistently. but I don't see stacktraces.
>
> Now, looking at the code, I see that this stacktrace is logged in DEBUG:
>
> final void dumpThreads()
> {
> try
> {
> String dump = new ThreadDump().call();
> log( LogService.LOG_DEBUG, dump, null );
>
> So, I will activate DEBUG and provide the stacktrace.
>
> But is this a bug to log this stacktrace in DEBUG ?
> I mean: should it be logged in WARN, instead of DEBUG ?
>
> /pierre
>
>
> On Tue, Oct 29, 2013 at 4:28 PM, David Jencks <[email protected]>wrote:
>
>> Hi Pierre,
>>
>> I think there should be a thread dump in the log to go with this message.
>> If there is could you send it to me or attach it to FELIX-4297?
>>
>> Does this happen consistently?
>>
>> many thanks
>> david jencks
>>
>> On Oct 29, 2013, at 3:43 AM, Pierre De Rop <[email protected]> wrote:
>>
>>> Hi David;
>>>
>>> 1) Every integration tests are passing OK.
>>>
>>> Regarding the OOM, I believe that the out of memory is likely to be cause
>>> by the following load tests, which are running in DEBUG mode:
>>>
>>>
>> src/test/java/org/apache/felix/scr/integration/ComponentConcurrencyTest.java
>>> src/test/java/org/apache/felix/scr/integration/Felix3680Test.java
>>> src/test/java/org/apache/felix/scr/integration/Felix3680_2Test.java
>>>
>>>
>>> So, I just added this in each files (in static initializers):
>>>
>>> DS_LOGLEVEL = "warn";
>>>
>>> And I have no more OOM, with this nice test result message:
>>>
>>> Tests run: 103, Failures: 0, Errors: 0, Skipped: 0
>>> [INFO] BUILD SUCCESS
>>>
>>>
>>>
>>> 2) Now, I did again a non regression tests with our applications, and I'm
>>> seeing a new WARN message (but it does not seem to prevent the
>> application
>>> to be correctly initialized: all components are in the end satisfied):
>>>
>>> (if you believe that it's now time to release, then you can go ahead, and
>>> we'll then investigate this new issue after the release ... I let you
>>> decide)
>>>
>>> ->
>>>
>>> 2013-10-29 11:10:02,487 CM Event Dispatcher (Fire ConfigurationEvent:
>>> pid=com.alcatel.as.ioh.impl.server.ServerFactoryImpl) ERROR osgi -
>>> [com.alcatel.as.ioh.impl.server.ServerFactoryImpl(91)] DependencyManager
>> :
>>> invokeBindMethod : timeout on open latch newtcpprocessor
>>>
>>>
>>>
>>>
>>> kind regards;
>>> /Pierre
>>>
>>>
>>> On Tue, Oct 29, 2013 at 12:13 AM, David Jencks <[email protected]
>>> wrote:
>>>
>>>> Pierre,
>>>>
>>>> I opened https://issues.apache.org/jira/browse/FELIX-4297 and fixed the
>>>> problems I found (for 2). I don't see the OOM often enough to have any
>>>> confidence that anything I do would actually fix it, so I'm inclined to
>> do
>>>> nothing. Is that OK with you?
>>>>
>>>> Unless you can find some more problems :-) I'm planning to try another
>>>> release when the config admin 1.8 gets to maven central. I'm going to
>>>> update the pom to normally run against the CA 1.8 version supporting R5
>> and
>>>> change the profile so running against R4 requires specifying profiles
>>>> explicitly.
>>>>
>>>> thanks again!
>>>> david jencks
>>>>
>>>> On Oct 28, 2013, at 12:24 AM, David Jencks <[email protected]>
>> wrote:
>>>>
>>>>> Hi Pierre,
>>>>>
>>>>> Much better to find these problems before a release than just after!
>>>>>
>>>>> I saw an OOM once recently but haven't been able to reproduce it.
>>>>>
>>>>> I'm looking into the NPE. I think I see the timing hole it is using
>> but
>>>> need to think about it some more.
>>>>>
>>>>> many thanks!
>>>>> david jencks
>>>>>
>>>>> On Oct 27, 2013, at 2:58 AM, Pierre De Rop <[email protected]>
>>>> wrote:
>>>>>
>>>>>> Hi David,
>>>>>>
>>>>>> Looking at our configurator component we are currently using (but we
>>>> will fix it in order to use the multi-location "?"), I see this:
>>>>>>
>>>>>> void configure(String pid, Dictionary pidConf) {
>>>>>> Configuration config = getConfiguration(_pid, null);
>>>>>> if (config.getBundleLocation() != null) {
>>>>>> config.setBundleLocation(null);
>>>>>> }
>>>>>> config.update(pidConf);
>>>>>> }
>>>>>>
>>>>>> So I believe that you are getting a null configuration because there
>> is
>>>> a short window between the setBundleLocation(null) (at this point, the
>>>> configuration is null) and the config.update(pidConf) call ...
>>>>>>
>>>>>> So, the good news is that I'm not having anymore some NPE using your
>>>> latest commits :-) and I think our application is now fully operational.
>>>>>>
>>>>>> but ... (please don't start to abominate me ) now, in order to do a
>>>> final check, I restarted the integration tests and there is still two
>>>> problems:
>>>>>>
>>>>>> 1) I'm sometimes getting some out of memory errors: this is probably
>>>> caused by the ComponentConcurrencyTest/Felix3680Test tests, which are
>>>> currently configured in DEBUG mode ?
>>>>>>
>>>>>> 2) I ran the tests two times, and the second time, I got this
>> exception
>>>> with the failing
>>>>>> Felix3680_2Test:
>>>>>>
>>>>>>
>>>>
>> test_concurrent_injection_with_bundleContext(org.apache.felix.scr.integration.Felix3680_2Test)
>>>> Time elapsed: 36.597 sec <<< ERROR!
>>>>>> java.lang.NullPointerException
>>>>>> at
>>>>
>> org.apache.felix.scr.impl.manager.DependencyManager.invokeUnbindMethod(DependencyManager.java:1710)
>>>>>> at
>>>>
>> org.apache.felix.scr.impl.manager.SingleComponentManager.invokeUnbindMethod(SingleComponentManager.java:387)
>>>>>> at
>>>>
>> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.removedService(DependencyManager.java:355)
>>>>>> at
>>>>
>> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.removedService(DependencyManager.java:290)
>>>>>> at
>>>>
>> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1503)
>>>>>> at
>>>>
>> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1398)
>>>>>> at
>>>>
>> org.apache.felix.scr.impl.manager.ServiceTracker$AbstractTracked.untrack(ServiceTracker.java:1258)
>>>>>> at
>>>>
>> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1437)
>>>>>> at
>>>>
>> org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:932)
>>>>>> at
>>>>
>> org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:793)
>>>>>> at
>>>>
>> org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:543)
>>>>>> at
>>>> org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4260)
>>>>>> at org.apache.felix.framework.Felix.access$000(Felix.java:74)
>>>>>> at
>>>> org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:390)
>>>>>> at
>>>>
>> org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:148)
>>>>>> at
>>>>
>> org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:127)
>>>>>> at
>>>>
>> org.apache.felix.scr.integration.components.felix3680_2.Main$RegistrationHelper$2.run(Main.java:136)
>>>>>> at
>>>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>> at
>>>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>> at java.lang.Thread.run(Thread.java:722)
>>>>>>
>>>>>> Are you also getting this exception ?
>>>>>>
>>>>>> thanks
>>>>>>
>>>>>> /Pierre
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Oct 26, 2013 at 6:34 PM, David Jencks <[email protected]
>>>
>>>> wrote:
>>>>>> Hi PIerre,
>>>>>>
>>>>>> Looking at the CA spec it looks like CA is supposed to send out
>>>> CM_LOCATION_CHANGED events even before any properties are set when
>>>> setBundleLocation is called. I added some code to ignore these events.
>>>> Note that DS is "reserving" the configurations for (one of) the
>>>> component(s) that will be consuming them by calling
>> getConfiguration(pid).
>>>>>>
>>>>>> I do wonder how the location to something non-null on your
>>>> configurations before the properties are set.
>>>>>>
>>>>>> Waiting for the next bug :-)
>>>>>>
>>>>>> thanks
>>>>>> david jencks
>>>>>>
>>>>>> On Oct 26, 2013, at 3:00 AM, Pierre De Rop <[email protected]>
>>>> wrote:
>>>>>>
>>>>>>> Hello David,
>>>>>>>
>>>>>>> The code we are using to configure our components is old, at at the
>>>> time we wrote it, configadmin was not supporting multi-location. But I
>> do
>>>> agree, we can now use the "?" multi-location.
>>>>>>>
>>>>>>> Now, I'm sorry but I'm still seeing another NPE (sometimes, not
>>>> always):
>>>>>>>
>>>>>>> 2013-10-26 11:45:44,209 CM Event Dispatcher (Fire ConfigurationEvent:
>>>> pid=sipagent) ERROR osgi - [43] Unexpected problem delivering
>>>> configuration event to [org.osgi.service.cm.ConfigurationListener,
>> id=102,
>>>>
>> bundle=341/reference:file:/home/nxuser/pp/bundles/custo/org.apache.felix.scr.jar]
>>>>>>>
>>>>>>> java.lang.NullPointerException
>>>>>>> at
>>>>
>> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.getProperties(ComponentFactoryImpl.java:226)
>>>>>>> at
>>>>
>> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.configurationUpdated(ComponentFactoryImpl.java:396)
>>>>>>> at
>>>>
>> org.apache.felix.scr.impl.config.ConfigurationSupport.configurationEvent(ConfigurationSupport.java:344)
>>>>>>> at
>>>>
>> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2032)
>>>>>>> at
>>>>
>> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2002)
>>>>>>> at
>>>> org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103)
>>>>>>> at java.lang.Thread.run(Thread.java:722)
>>>>>>>
>>>>>>>
>>>>>>> I'm not sure, but it seems that ConfigAdmin is providing a null
>>>> dictionary, when delivering a CM_LOCATION_CHANGED event ? if correct,
>> then
>>>> Is this a normal behavior ?
>>>>>>>
>>>>>>> This is strange; perhaps I shall start a new integration test ?
>>>>>>>
>>>>>>> /Pierre
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Oct 26, 2013 at 9:54 AM, David Jencks <
>> [email protected]>
>>>> wrote:
>>>>>>> Hi Pierre,
>>>>>>>
>>>>>>> This pointed out a logic error I introduced for Felix 3651. I opened
>>>> https://issues.apache.org/jira/browse/FELIX-4293 and fixed the error I
>>>> found which I think explains the NPE. Could you check this?
>>>>>>>
>>>>>>> Could I ask what you are trying to do by setting the bundleLocation
>> to
>>>> null? If you want to allow any bundle to receive the configuration you
>>>> could use multi-location support and set the location to "?" With the
>> code
>>>> you have now, if the configuration is already in use by a DS component,
>> the
>>>> location changed event will result in the bundle location being reset
>> back
>>>> to what it was.
>>>>>>>
>>>>>>> thanks!
>>>>>>> david jencks
>>>>>>> On Oct 25, 2013, at 8:32 AM, Pierre De Rop <[email protected]>
>>>> wrote:
>>>>>>>
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> thanks; The fix is fixing the problem :-)
>>>>>>>>
>>>>>>>> but ... there's now a new different problem: i'm now sometimes
>>>> getting this
>>>>>>>> NPE, after SCR is receiving a CM_LOCATION_CHANGED event:
>>>>>>>>
>>>>>>>> 2013-10-25 16:11:44,674 CM Event Dispatcher (Fire
>> ConfigurationEvent:
>>>>>>>> pid=sipagent) ERROR osgi - [43] Unexpected problem delivering
>>>>>>>> configuration event to [org.osgi.service.cm.ConfigurationListener,
>>>> id=102,
>>>>>>>>
>>>>
>> bundle=341/reference:file:/home/nxuser/pp/bundles/custo/org.apache.felix.scr.jar]
>>>>>>>>
>>>>>>>> java.lang.NullPointerException
>>>>>>>> at
>>>>>>>>
>>>>
>> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.getProperties(ComponentFactoryImpl.java:226)
>>>>>>>> at
>>>>>>>>
>>>>
>> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.configurationUpdated(ComponentFactoryImpl.java:396)
>>>>>>>> at
>>>>>>>>
>>>>
>> org.apache.felix.scr.impl.config.ConfigurationSupport.configurationEvent(ConfigurationSupport.java:390)
>>>>>>>> at
>>>>>>>>
>>>>
>> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2032)
>>>>>>>> at
>>>>>>>>
>>>>
>> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2002)
>>>>>>>> at
>>>> org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103)
>>>>>>>> at java.lang.Thread.run(Thread.java:722)
>>>>>>>>
>>>>>>>> Perhaps a new jira issue shall be opened ?
>>>>>>>>
>>>>>>>> I think we are getting a CM_LOCATION_CHANGED event because in our
>>>>>>>> application, we populate configuration admin by doing something like
>>>> this:
>>>>>>>>
>>>>>>>> Configuration cfg = cm.getConfiguration(pid, null)
>>>>>>>> if (config.getBundleLocation() != null) {
>>>>>>>> config.setBundleLocation(null);
>>>>>>>> }
>>>>>>>>
>>>>>>>> The setBundleLocation(null) is probably useless, but this leads to a
>>>>>>>> CM_LOCATION_CHANGED event, which then sometimes ends up with the
>> NPE.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Friday, October 25, 2013, David Jencks <[email protected]>
>>>> wrote:
>>>>>>>>> Hi Pierre,
>>>>>>>>>
>>>>>>>>> You are so good at writing useful tests!!
>>>>>>>>>
>>>>>>>>> I found a place to call setTargets(getProperties()) from inside
>>>>>>>> ComponentFactoryImpl that would have fewer side effects. Could you
>>>> see if
>>>>>>>> this makes your actual applications work properly? I'm uploading a
>>>>>>>> snapshot.
>>>>>>>>>
>>>>>>>>> many thanks
>>>>>>>>> david jencks
>>>>>>>>>
>>>>>>>>> On Oct 24, 2013, at 6:17 AM, Pierre De Rop <[email protected]
>>>
>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi David,
>>>>>>>>>>
>>>>>>>>>> Since this application is complex, I'm not able to provide logs
>>>> because
>>>>>>>>>> there are hundreds of components involved which are not mine, and
>>>> for
>>>>>>>> now,
>>>>>>>>>> I'm not able to diagnose the problem.
>>>>>>>>>>
>>>>>>>>>> But I have created FELIX-4290, and joined to it an integration
>> test
>>>> which
>>>>>>>>>> seems to reproduce the kind of problem I think I'm having in my
>>>>>>>>>> application. I also joined the proposed patch.
>>>>>>>>>>
>>>>>>>>>> I did not have time to test the patch you suggested regarding the
>>>>>>>>>> SingleComponentManager.reconfigure method, so let's continue to
>>>>>>>> investigate
>>>>>>>>>> using the jira issue and the test I attached to it.
>>>>>>>>>>
>>>>>>>>>> Thanks;
>>>>>>>>>>
>>>>>>>>>> /Pierre
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Oct 24, 2013 at 12:27 AM, David Jencks <
>>>> [email protected]
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Pierre,
>>>>>>>>>>>
>>>>>>>>>>> I believe you that this code path doesn't work :-)
>>>>>>>>>>>
>>>>>>>>>>> I think there should be a less invasive way to fix this. By any
>>>> chance
>>>>>>>>>>> can you get a debug-enabled log from when this problem occurs?
>> It
>>>> would
>>>>>>>>>>> help confirm my suspicions of what might be missing.
>>>>>>>>>>>
>>>>>>>>>>> FWIW I suspect SingleComponentManager.reconfigure is missing a
>>>> check for
>>>>>>>>>>> m_factoryProperties here (line 561):
>>>>>>>>>>>
>>>>>>>>>>> // nothing to do if there is no configuration (see
>>>> FELIX-714)
>>>>>>>>>>> if ( configuration == null && m_configurationProperties
>> ==
>>>>>>>>>>> null )
>>>>>>>>>>> {
>>>>>>>>>>> log( LogService.LOG_DEBUG, "No configuration provided
>>>> (or
>>>>>>>>>>> deleted), nothing to do", null );
>>>>>>>>>>> return;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Unless we can't figure anything out for sure I'd prefer to fix
>> this
>>>>>>>> before
>>>>>>>>>>> the release.
>>>>>>>>>>>
>>>>>>>>>>> thanks
>>>>>>>>>>> david jencks
>>>>>>>>>>>
>>>>>>>>>>> On Oct 23, 2013, at 3:09 PM, Pierre De Rop <
>> [email protected]
>>>>>
>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>
>>>>>>>>>>>> (sorry to do all this noise while you are releasing ...)
>>>>>>>>>>>>
>>>>>>>>>>>> We are indeed using factory components; and today, I finally
>>>> found and
>>>>>>>>>>>> fixed a cycle, using the Apache Service Diagnostic tool; and I'm
>>>> going
>>>>>>>>>>>> further on but now I'm facing another problem which I did not
>>>> have in
>>>>>>>> the
>>>>>>>>>>>> scr 1.6.2.
>>>>>>>>>>>>
>>>>>>>>>>>> So, I would like to discuss about this new problem with you
>>>> before you
>>>>>>>>>>> redo
>>>>>>>>>>>> a release, in order to decide if this problem (if there is
>> really
>>>> one
>>>>>>>> ?)
>>>>>>>>>>>> shall be addressed now or after the upcoming release ?
>>>>>>>>>>>>
>>>>>>>>>>>> So, in our application, we are extensively using factory
>>>> components
>>>>>>>>>>>> (@Component(factory=XXX")).
>>>>>>>>>>>> When we instantiate a factory component (using
>>>>>>>>>>>> ComponentFactory.newInstance()), We pass to the newInstance()
>>>> method
>>>>>>>> some
>>>>>>>>>>>> additional component properties which may also contain some
>> target
>>>>>>>>>>> filters.
>>>>>>>>>>>>
>>>>>>>>>>>> This allows to dynamically configure the filter of some
>> References
>>>>>>>>>>> declared
>>>>>>>>>>>> in the factory component.
>>>>>>>>>>>> in the scr 1.6.2, this mechanism was working fine. But using
>>>> trunk,
>>>>>>>> this
>>>>>>>>>>>> does not work all the time. Some target filters seem to be
>>>> correctly
>>>>>>>>>>>> configured, and some others are not (I'm not sure, actually,
>> it's
>>>> late
>>>>>>>>>>> ...).
>>>>>>>>>>>>
>>>>>>>>>>>> So, it looks like sometimes, some target filters are not updated
>>>> before
>>>>>>>>>>>> activating components ? or factory components ?
>>>>>>>>>>>>
>>>>>>>>>>>> I'm not sure but this might be related to the old FELIX-3726.
>>>>>>>>>>>> Now, interestingly, I did the following patch and my application
>>>> is now
>>>>>>>>>>>> working fine: In the AbstractComponentManager class, I
>>>> systematically
>>>>>>>>>>>> update target filters, like this:
>>>>>>>>>>>>
>>>>>>>>>>>> +++
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>
>> src/main/java/org/apache/felix/scr/impl/manager/AbstractComponentManager.java
>>>>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>
>>