Hi Pierre,

I think there should be a thread dump in the log to go with this message.  If 
there is could you send it to me or attach it to FELIX-4297?

Does this happen consistently?

many thanks
david jencks

On Oct 29, 2013, at 3:43 AM, Pierre De Rop <[email protected]> wrote:

> Hi David;
> 
> 1) Every integration tests are passing OK.
> 
> Regarding the OOM, I believe that the out of memory is likely to be cause
> by the following load tests, which are running in DEBUG mode:
> 
> src/test/java/org/apache/felix/scr/integration/ComponentConcurrencyTest.java
> src/test/java/org/apache/felix/scr/integration/Felix3680Test.java
> src/test/java/org/apache/felix/scr/integration/Felix3680_2Test.java
> 
> 
> So, I just added this in each files (in static initializers):
> 
>        DS_LOGLEVEL = "warn";
> 
> And I have no more OOM, with this nice test result message:
> 
> Tests run: 103, Failures: 0, Errors: 0, Skipped: 0
> [INFO] BUILD SUCCESS
> 
> 
> 
> 2) Now, I did again a non regression tests with our applications, and I'm
> seeing a new WARN message (but it does not seem to prevent the application
> to be correctly initialized: all components are in the end satisfied):
> 
> (if you believe that it's now time to release, then you can go ahead, and
> we'll then investigate this new issue after the release ... I let you
> decide)
> 
> ->
> 
> 2013-10-29 11:10:02,487 CM Event Dispatcher (Fire ConfigurationEvent:
> pid=com.alcatel.as.ioh.impl.server.ServerFactoryImpl) ERROR osgi  -
> [com.alcatel.as.ioh.impl.server.ServerFactoryImpl(91)] DependencyManager :
> invokeBindMethod : timeout on open latch newtcpprocessor
> 
> 
> 
> 
> kind regards;
> /Pierre
> 
> 
> On Tue, Oct 29, 2013 at 12:13 AM, David Jencks <[email protected]>wrote:
> 
>> Pierre,
>> 
>> I opened https://issues.apache.org/jira/browse/FELIX-4297 and fixed the
>> problems I found (for 2).  I don't see the OOM often enough to have any
>> confidence that anything I do would actually fix it, so I'm inclined to do
>> nothing.  Is that OK with you?
>> 
>> Unless you can find some more problems :-)  I'm planning to try another
>> release when the config admin 1.8 gets to maven central.  I'm going to
>> update the pom to normally run against the CA 1.8 version supporting R5 and
>> change the profile so running against R4 requires specifying profiles
>> explicitly.
>> 
>> thanks again!
>> david jencks
>> 
>> On Oct 28, 2013, at 12:24 AM, David Jencks <[email protected]> wrote:
>> 
>>> Hi Pierre,
>>> 
>>> Much better to find these problems before a release than just after!
>>> 
>>> I saw an OOM once recently but haven't been able to reproduce it.
>>> 
>>> I'm looking into the NPE.  I think I see the timing hole it is using but
>> need to think about it some more.
>>> 
>>> many thanks!
>>> david jencks
>>> 
>>> On Oct 27, 2013, at 2:58 AM, Pierre De Rop <[email protected]>
>> wrote:
>>> 
>>>> Hi David,
>>>> 
>>>> Looking at our configurator component we are currently using (but we
>> will fix it in order to use the multi-location "?"), I see this:
>>>> 
>>>> void configure(String pid, Dictionary pidConf) {
>>>>    Configuration config = getConfiguration(_pid, null);
>>>>    if (config.getBundleLocation() != null) {
>>>>        config.setBundleLocation(null);
>>>>    }
>>>>    config.update(pidConf);
>>>> }
>>>> 
>>>> So I believe that you are getting a null configuration because there is
>> a short window between the setBundleLocation(null) (at this point, the
>> configuration is null) and the config.update(pidConf) call ...
>>>> 
>>>> So, the good news is that I'm not having anymore some NPE using your
>> latest commits :-) and I think our application is now fully operational.
>>>> 
>>>> but ... (please don't start to abominate me  ) now, in order to do a
>> final check, I restarted the integration tests and there is still two
>> problems:
>>>> 
>>>> 1) I'm sometimes getting some out of memory errors: this is probably
>> caused by the ComponentConcurrencyTest/Felix3680Test tests, which are
>> currently configured in DEBUG mode ?
>>>> 
>>>> 2) I ran the tests two times, and the second time, I got this exception
>> with the failing
>>>> Felix3680_2Test:
>>>> 
>>>> 
>> test_concurrent_injection_with_bundleContext(org.apache.felix.scr.integration.Felix3680_2Test)
>> Time elapsed: 36.597 sec  <<< ERROR!
>>>> java.lang.NullPointerException
>>>>       at
>> org.apache.felix.scr.impl.manager.DependencyManager.invokeUnbindMethod(DependencyManager.java:1710)
>>>>       at
>> org.apache.felix.scr.impl.manager.SingleComponentManager.invokeUnbindMethod(SingleComponentManager.java:387)
>>>>       at
>> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.removedService(DependencyManager.java:355)
>>>>       at
>> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.removedService(DependencyManager.java:290)
>>>>       at
>> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1503)
>>>>       at
>> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1398)
>>>>       at
>> org.apache.felix.scr.impl.manager.ServiceTracker$AbstractTracked.untrack(ServiceTracker.java:1258)
>>>>       at
>> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1437)
>>>>       at
>> org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:932)
>>>>       at
>> org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:793)
>>>>       at
>> org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:543)
>>>>       at
>> org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4260)
>>>>       at org.apache.felix.framework.Felix.access$000(Felix.java:74)
>>>>       at
>> org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:390)
>>>>       at
>> org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:148)
>>>>       at
>> org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:127)
>>>>       at
>> org.apache.felix.scr.integration.components.felix3680_2.Main$RegistrationHelper$2.run(Main.java:136)
>>>>       at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>       at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>       at java.lang.Thread.run(Thread.java:722)
>>>> 
>>>> Are you also getting this exception ?
>>>> 
>>>> thanks
>>>> 
>>>> /Pierre
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Sat, Oct 26, 2013 at 6:34 PM, David Jencks <[email protected]>
>> wrote:
>>>> Hi PIerre,
>>>> 
>>>> Looking at the CA spec it looks like CA is supposed to send out
>> CM_LOCATION_CHANGED events even before any properties are set when
>> setBundleLocation is called.  I added some code to ignore these events.
>> Note that DS is "reserving" the configurations for (one of) the
>> component(s) that will be consuming them by calling getConfiguration(pid).
>>>> 
>>>> I do wonder how the location to something non-null on your
>> configurations before the properties are set.
>>>> 
>>>> Waiting for the next bug :-)
>>>> 
>>>> thanks
>>>> david jencks
>>>> 
>>>> On Oct 26, 2013, at 3:00 AM, Pierre De Rop <[email protected]>
>> wrote:
>>>> 
>>>>> Hello David,
>>>>> 
>>>>> The code we are using to configure our components is old, at at the
>> time we wrote it, configadmin was not supporting multi-location. But I do
>> agree, we can now use the "?" multi-location.
>>>>> 
>>>>> Now, I'm sorry but I'm still seeing another NPE (sometimes, not
>> always):
>>>>> 
>>>>> 2013-10-26 11:45:44,209 CM Event Dispatcher (Fire ConfigurationEvent:
>> pid=sipagent) ERROR osgi  - [43] Unexpected problem delivering
>> configuration event to [org.osgi.service.cm.ConfigurationListener, id=102,
>> bundle=341/reference:file:/home/nxuser/pp/bundles/custo/org.apache.felix.scr.jar]
>>>>> 
>>>>> java.lang.NullPointerException
>>>>>       at
>> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.getProperties(ComponentFactoryImpl.java:226)
>>>>>       at
>> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.configurationUpdated(ComponentFactoryImpl.java:396)
>>>>>       at
>> org.apache.felix.scr.impl.config.ConfigurationSupport.configurationEvent(ConfigurationSupport.java:344)
>>>>>       at
>> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2032)
>>>>>       at
>> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2002)
>>>>>       at
>> org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103)
>>>>>       at java.lang.Thread.run(Thread.java:722)
>>>>> 
>>>>> 
>>>>> I'm not sure, but it seems that ConfigAdmin is providing a null
>> dictionary, when delivering a CM_LOCATION_CHANGED event ? if correct, then
>> Is this a normal behavior ?
>>>>> 
>>>>> This is strange; perhaps I shall start a new integration test ?
>>>>> 
>>>>> /Pierre
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sat, Oct 26, 2013 at 9:54 AM, David Jencks <[email protected]>
>> wrote:
>>>>> Hi Pierre,
>>>>> 
>>>>> This pointed out a logic error I introduced for Felix 3651.  I opened
>> https://issues.apache.org/jira/browse/FELIX-4293 and fixed the error I
>> found which I think explains the NPE.  Could you check this?
>>>>> 
>>>>> Could I ask what you are trying to do by setting the bundleLocation to
>> null?  If you want to allow any bundle to receive the configuration you
>> could use multi-location support and set the location to "?"  With the code
>> you have now, if the configuration is already in use by a DS component, the
>> location changed event will result in the bundle location being reset back
>> to what it was.
>>>>> 
>>>>> thanks!
>>>>> david jencks
>>>>> On Oct 25, 2013, at 8:32 AM, Pierre De Rop <[email protected]>
>> wrote:
>>>>> 
>>>>>> Hi David,
>>>>>> 
>>>>>> thanks; The fix is fixing the problem :-)
>>>>>> 
>>>>>> but ... there's now a new different problem: i'm now sometimes
>> getting this
>>>>>> NPE, after SCR is receiving a CM_LOCATION_CHANGED event:
>>>>>> 
>>>>>> 2013-10-25 16:11:44,674 CM Event Dispatcher (Fire ConfigurationEvent:
>>>>>> pid=sipagent) ERROR osgi  - [43] Unexpected problem delivering
>>>>>> configuration event to [org.osgi.service.cm.ConfigurationListener,
>> id=102,
>>>>>> 
>> bundle=341/reference:file:/home/nxuser/pp/bundles/custo/org.apache.felix.scr.jar]
>>>>>> 
>>>>>> java.lang.NullPointerException
>>>>>>     at
>>>>>> 
>> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.getProperties(ComponentFactoryImpl.java:226)
>>>>>>     at
>>>>>> 
>> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.configurationUpdated(ComponentFactoryImpl.java:396)
>>>>>>     at
>>>>>> 
>> org.apache.felix.scr.impl.config.ConfigurationSupport.configurationEvent(ConfigurationSupport.java:390)
>>>>>>     at
>>>>>> 
>> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2032)
>>>>>>     at
>>>>>> 
>> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2002)
>>>>>>     at
>> org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103)
>>>>>>     at java.lang.Thread.run(Thread.java:722)
>>>>>> 
>>>>>> Perhaps a new jira issue shall be opened ?
>>>>>> 
>>>>>> I think we are getting a CM_LOCATION_CHANGED event because in our
>>>>>> application, we populate configuration admin by doing something like
>> this:
>>>>>> 
>>>>>> Configuration cfg = cm.getConfiguration(pid, null)
>>>>>> if (config.getBundleLocation() != null) {
>>>>>>   config.setBundleLocation(null);
>>>>>> }
>>>>>> 
>>>>>> The setBundleLocation(null) is probably useless, but this leads to a
>>>>>> CM_LOCATION_CHANGED event, which then sometimes ends up with the NPE.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Friday, October 25, 2013, David Jencks <[email protected]>
>> wrote:
>>>>>>> Hi Pierre,
>>>>>>> 
>>>>>>> You are so good at writing useful tests!!
>>>>>>> 
>>>>>>> I found a place to call setTargets(getProperties()) from inside
>>>>>> ComponentFactoryImpl that would have fewer side effects.  Could you
>> see if
>>>>>> this makes your actual applications work properly?  I'm uploading a
>>>>>> snapshot.
>>>>>>> 
>>>>>>> many thanks
>>>>>>> david jencks
>>>>>>> 
>>>>>>> On Oct 24, 2013, at 6:17 AM, Pierre De Rop <[email protected]>
>> wrote:
>>>>>>> 
>>>>>>>> Hi David,
>>>>>>>> 
>>>>>>>> Since this application is complex, I'm not able to provide logs
>> because
>>>>>>>> there are hundreds of components involved which are not mine, and
>> for
>>>>>> now,
>>>>>>>> I'm not able to diagnose the problem.
>>>>>>>> 
>>>>>>>> But I have created FELIX-4290, and joined to it an integration test
>> which
>>>>>>>> seems to reproduce the kind of problem I think I'm having in my
>>>>>>>> application. I also joined the proposed patch.
>>>>>>>> 
>>>>>>>> I did not have time to test the patch you suggested regarding the
>>>>>>>> SingleComponentManager.reconfigure method, so let's continue to
>>>>>> investigate
>>>>>>>> using the jira issue and the test I attached to it.
>>>>>>>> 
>>>>>>>> Thanks;
>>>>>>>> 
>>>>>>>> /Pierre
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Thu, Oct 24, 2013 at 12:27 AM, David Jencks <
>> [email protected]
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Pierre,
>>>>>>>>> 
>>>>>>>>> I believe you that this code path doesn't work :-)
>>>>>>>>> 
>>>>>>>>> I think there should be a less invasive way to fix this.  By any
>> chance
>>>>>>>>> can you get a debug-enabled log from when this problem occurs?  It
>> would
>>>>>>>>> help confirm my suspicions of what might be missing.
>>>>>>>>> 
>>>>>>>>> FWIW I suspect SingleComponentManager.reconfigure is missing a
>> check for
>>>>>>>>> m_factoryProperties here (line 561):
>>>>>>>>> 
>>>>>>>>>         // nothing to do if there is no configuration (see
>> FELIX-714)
>>>>>>>>>         if ( configuration == null && m_configurationProperties ==
>>>>>>>>> null )
>>>>>>>>>         {
>>>>>>>>>             log( LogService.LOG_DEBUG, "No configuration provided
>> (or
>>>>>>>>> deleted), nothing to do", null );
>>>>>>>>>             return;
>>>>>>>>>         }
>>>>>>>>> 
>>>>>>>>> Unless we can't figure anything out for sure I'd prefer to fix this
>>>>>> before
>>>>>>>>> the release.
>>>>>>>>> 
>>>>>>>>> thanks
>>>>>>>>> david jencks
>>>>>>>>> 
>>>>>>>>> On Oct 23, 2013, at 3:09 PM, Pierre De Rop <[email protected]
>>> 
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi David,
>>>>>>>>>> 
>>>>>>>>>> (sorry to do all this noise while you are releasing ...)
>>>>>>>>>> 
>>>>>>>>>> We are indeed using factory components; and today, I finally
>> found and
>>>>>>>>>> fixed a cycle, using the Apache Service Diagnostic tool; and I'm
>> going
>>>>>>>>>> further on but now I'm facing another problem which I did not
>> have in
>>>>>> the
>>>>>>>>>> scr 1.6.2.
>>>>>>>>>> 
>>>>>>>>>> So, I would like to discuss about this new problem with you
>> before you
>>>>>>>>> redo
>>>>>>>>>> a release, in order to decide if this problem (if there is really
>> one
>>>>>> ?)
>>>>>>>>>> shall be addressed now or after the upcoming release ?
>>>>>>>>>> 
>>>>>>>>>> So, in our application, we are extensively using factory
>> components
>>>>>>>>>> (@Component(factory=XXX")).
>>>>>>>>>> When we instantiate a factory component (using
>>>>>>>>>> ComponentFactory.newInstance()), We pass to the newInstance()
>> method
>>>>>> some
>>>>>>>>>> additional component properties which may also contain some target
>>>>>>>>> filters.
>>>>>>>>>> 
>>>>>>>>>> This allows to dynamically configure the filter of some References
>>>>>>>>> declared
>>>>>>>>>> in the factory component.
>>>>>>>>>> in the scr 1.6.2, this mechanism was working fine. But using
>> trunk,
>>>>>> this
>>>>>>>>>> does not work all the time. Some target filters seem to be
>> correctly
>>>>>>>>>> configured, and some others are not (I'm not sure, actually, it's
>> late
>>>>>>>>> ...).
>>>>>>>>>> 
>>>>>>>>>> So, it looks like sometimes, some target filters are not updated
>> before
>>>>>>>>>> activating components ? or factory components ?
>>>>>>>>>> 
>>>>>>>>>> I'm not sure but this might be related to the old FELIX-3726.
>>>>>>>>>> Now, interestingly, I did the following patch and my application
>> is now
>>>>>>>>>> working fine: In the  AbstractComponentManager class, I
>> systematically
>>>>>>>>>> update target filters, like this:
>>>>>>>>>> 
>>>>>>>>>> +++
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>> src/main/java/org/apache/felix/scr/impl/manager/AbstractComponentManager.java
>>>>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 

Reply via email to