David, I just attached the stacktrace to the FELIX-4297

On Tue, Oct 29, 2013 at 4:46 PM, Pierre De Rop <[email protected]>wrote:

> yes David, consistently. but I don't see stacktraces.
>
> Now, looking at the code, I see that this stacktrace is logged in DEBUG:
>
>     final void dumpThreads()
>     {
>         try
>         {
>             String dump = new ThreadDump().call();
>             log( LogService.LOG_DEBUG, dump, null );
>
> So, I will activate DEBUG and provide the stacktrace.
>
> But is this a bug to log this stacktrace in DEBUG ?
> I mean: should it be logged in WARN, instead of DEBUG ?
>
> /pierre
>
>
> On Tue, Oct 29, 2013 at 4:28 PM, David Jencks <[email protected]>wrote:
>
>> Hi Pierre,
>>
>> I think there should be a thread dump in the log to go with this message.
>>  If there is could you send it to me or attach it to FELIX-4297?
>>
>> Does this happen consistently?
>>
>> many thanks
>> david jencks
>>
>> On Oct 29, 2013, at 3:43 AM, Pierre De Rop <[email protected]>
>> wrote:
>>
>> > Hi David;
>> >
>> > 1) Every integration tests are passing OK.
>> >
>> > Regarding the OOM, I believe that the out of memory is likely to be
>> cause
>> > by the following load tests, which are running in DEBUG mode:
>> >
>> >
>> src/test/java/org/apache/felix/scr/integration/ComponentConcurrencyTest.java
>> > src/test/java/org/apache/felix/scr/integration/Felix3680Test.java
>> > src/test/java/org/apache/felix/scr/integration/Felix3680_2Test.java
>> >
>> >
>> > So, I just added this in each files (in static initializers):
>> >
>> >        DS_LOGLEVEL = "warn";
>> >
>> > And I have no more OOM, with this nice test result message:
>> >
>> > Tests run: 103, Failures: 0, Errors: 0, Skipped: 0
>> > [INFO] BUILD SUCCESS
>> >
>> >
>> >
>> > 2) Now, I did again a non regression tests with our applications, and
>> I'm
>> > seeing a new WARN message (but it does not seem to prevent the
>> application
>> > to be correctly initialized: all components are in the end satisfied):
>> >
>> > (if you believe that it's now time to release, then you can go ahead,
>> and
>> > we'll then investigate this new issue after the release ... I let you
>> > decide)
>> >
>> > ->
>> >
>> > 2013-10-29 11:10:02,487 CM Event Dispatcher (Fire ConfigurationEvent:
>> > pid=com.alcatel.as.ioh.impl.server.ServerFactoryImpl) ERROR osgi  -
>> > [com.alcatel.as.ioh.impl.server.ServerFactoryImpl(91)]
>> DependencyManager :
>> > invokeBindMethod : timeout on open latch newtcpprocessor
>> >
>> >
>> >
>> >
>> > kind regards;
>> > /Pierre
>> >
>> >
>> > On Tue, Oct 29, 2013 at 12:13 AM, David Jencks <[email protected]
>> >wrote:
>> >
>> >> Pierre,
>> >>
>> >> I opened https://issues.apache.org/jira/browse/FELIX-4297 and fixed
>> the
>> >> problems I found (for 2).  I don't see the OOM often enough to have any
>> >> confidence that anything I do would actually fix it, so I'm inclined
>> to do
>> >> nothing.  Is that OK with you?
>> >>
>> >> Unless you can find some more problems :-)  I'm planning to try another
>> >> release when the config admin 1.8 gets to maven central.  I'm going to
>> >> update the pom to normally run against the CA 1.8 version supporting
>> R5 and
>> >> change the profile so running against R4 requires specifying profiles
>> >> explicitly.
>> >>
>> >> thanks again!
>> >> david jencks
>> >>
>> >> On Oct 28, 2013, at 12:24 AM, David Jencks <[email protected]>
>> wrote:
>> >>
>> >>> Hi Pierre,
>> >>>
>> >>> Much better to find these problems before a release than just after!
>> >>>
>> >>> I saw an OOM once recently but haven't been able to reproduce it.
>> >>>
>> >>> I'm looking into the NPE.  I think I see the timing hole it is using
>> but
>> >> need to think about it some more.
>> >>>
>> >>> many thanks!
>> >>> david jencks
>> >>>
>> >>> On Oct 27, 2013, at 2:58 AM, Pierre De Rop <[email protected]>
>> >> wrote:
>> >>>
>> >>>> Hi David,
>> >>>>
>> >>>> Looking at our configurator component we are currently using (but we
>> >> will fix it in order to use the multi-location "?"), I see this:
>> >>>>
>> >>>> void configure(String pid, Dictionary pidConf) {
>> >>>>    Configuration config = getConfiguration(_pid, null);
>> >>>>    if (config.getBundleLocation() != null) {
>> >>>>        config.setBundleLocation(null);
>> >>>>    }
>> >>>>    config.update(pidConf);
>> >>>> }
>> >>>>
>> >>>> So I believe that you are getting a null configuration because there
>> is
>> >> a short window between the setBundleLocation(null) (at this point, the
>> >> configuration is null) and the config.update(pidConf) call ...
>> >>>>
>> >>>> So, the good news is that I'm not having anymore some NPE using your
>> >> latest commits :-) and I think our application is now fully
>> operational.
>> >>>>
>> >>>> but ... (please don't start to abominate me  ) now, in order to do a
>> >> final check, I restarted the integration tests and there is still two
>> >> problems:
>> >>>>
>> >>>> 1) I'm sometimes getting some out of memory errors: this is probably
>> >> caused by the ComponentConcurrencyTest/Felix3680Test tests, which are
>> >> currently configured in DEBUG mode ?
>> >>>>
>> >>>> 2) I ran the tests two times, and the second time, I got this
>> exception
>> >> with the failing
>> >>>> Felix3680_2Test:
>> >>>>
>> >>>>
>> >>
>> test_concurrent_injection_with_bundleContext(org.apache.felix.scr.integration.Felix3680_2Test)
>> >> Time elapsed: 36.597 sec  <<< ERROR!
>> >>>> java.lang.NullPointerException
>> >>>>       at
>> >>
>> org.apache.felix.scr.impl.manager.DependencyManager.invokeUnbindMethod(DependencyManager.java:1710)
>> >>>>       at
>> >>
>> org.apache.felix.scr.impl.manager.SingleComponentManager.invokeUnbindMethod(SingleComponentManager.java:387)
>> >>>>       at
>> >>
>> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.removedService(DependencyManager.java:355)
>> >>>>       at
>> >>
>> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.removedService(DependencyManager.java:290)
>> >>>>       at
>> >>
>> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1503)
>> >>>>       at
>> >>
>> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1398)
>> >>>>       at
>> >>
>> org.apache.felix.scr.impl.manager.ServiceTracker$AbstractTracked.untrack(ServiceTracker.java:1258)
>> >>>>       at
>> >>
>> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1437)
>> >>>>       at
>> >>
>> org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:932)
>> >>>>       at
>> >>
>> org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:793)
>> >>>>       at
>> >>
>> org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:543)
>> >>>>       at
>> >> org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4260)
>> >>>>       at org.apache.felix.framework.Felix.access$000(Felix.java:74)
>> >>>>       at
>> >> org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:390)
>> >>>>       at
>> >>
>> org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:148)
>> >>>>       at
>> >>
>> org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:127)
>> >>>>       at
>> >>
>> org.apache.felix.scr.integration.components.felix3680_2.Main$RegistrationHelper$2.run(Main.java:136)
>> >>>>       at
>> >>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >>>>       at
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >>>>       at java.lang.Thread.run(Thread.java:722)
>> >>>>
>> >>>> Are you also getting this exception ?
>> >>>>
>> >>>> thanks
>> >>>>
>> >>>> /Pierre
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Sat, Oct 26, 2013 at 6:34 PM, David Jencks <
>> [email protected]>
>> >> wrote:
>> >>>> Hi PIerre,
>> >>>>
>> >>>> Looking at the CA spec it looks like CA is supposed to send out
>> >> CM_LOCATION_CHANGED events even before any properties are set when
>> >> setBundleLocation is called.  I added some code to ignore these events.
>> >> Note that DS is "reserving" the configurations for (one of) the
>> >> component(s) that will be consuming them by calling
>> getConfiguration(pid).
>> >>>>
>> >>>> I do wonder how the location to something non-null on your
>> >> configurations before the properties are set.
>> >>>>
>> >>>> Waiting for the next bug :-)
>> >>>>
>> >>>> thanks
>> >>>> david jencks
>> >>>>
>> >>>> On Oct 26, 2013, at 3:00 AM, Pierre De Rop <[email protected]>
>> >> wrote:
>> >>>>
>> >>>>> Hello David,
>> >>>>>
>> >>>>> The code we are using to configure our components is old, at at the
>> >> time we wrote it, configadmin was not supporting multi-location. But I
>> do
>> >> agree, we can now use the "?" multi-location.
>> >>>>>
>> >>>>> Now, I'm sorry but I'm still seeing another NPE (sometimes, not
>> >> always):
>> >>>>>
>> >>>>> 2013-10-26 11:45:44,209 CM Event Dispatcher (Fire
>> ConfigurationEvent:
>> >> pid=sipagent) ERROR osgi  - [43] Unexpected problem delivering
>> >> configuration event to [org.osgi.service.cm.ConfigurationListener,
>> id=102,
>> >>
>> bundle=341/reference:file:/home/nxuser/pp/bundles/custo/org.apache.felix.scr.jar]
>> >>>>>
>> >>>>> java.lang.NullPointerException
>> >>>>>       at
>> >>
>> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.getProperties(ComponentFactoryImpl.java:226)
>> >>>>>       at
>> >>
>> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.configurationUpdated(ComponentFactoryImpl.java:396)
>> >>>>>       at
>> >>
>> org.apache.felix.scr.impl.config.ConfigurationSupport.configurationEvent(ConfigurationSupport.java:344)
>> >>>>>       at
>> >>
>> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2032)
>> >>>>>       at
>> >>
>> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2002)
>> >>>>>       at
>> >> org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103)
>> >>>>>       at java.lang.Thread.run(Thread.java:722)
>> >>>>>
>> >>>>>
>> >>>>> I'm not sure, but it seems that ConfigAdmin is providing a null
>> >> dictionary, when delivering a CM_LOCATION_CHANGED event ? if correct,
>> then
>> >> Is this a normal behavior ?
>> >>>>>
>> >>>>> This is strange; perhaps I shall start a new integration test ?
>> >>>>>
>> >>>>> /Pierre
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Sat, Oct 26, 2013 at 9:54 AM, David Jencks <
>> [email protected]>
>> >> wrote:
>> >>>>> Hi Pierre,
>> >>>>>
>> >>>>> This pointed out a logic error I introduced for Felix 3651.  I
>> opened
>> >> https://issues.apache.org/jira/browse/FELIX-4293 and fixed the error I
>> >> found which I think explains the NPE.  Could you check this?
>> >>>>>
>> >>>>> Could I ask what you are trying to do by setting the bundleLocation
>> to
>> >> null?  If you want to allow any bundle to receive the configuration you
>> >> could use multi-location support and set the location to "?"  With the
>> code
>> >> you have now, if the configuration is already in use by a DS
>> component, the
>> >> location changed event will result in the bundle location being reset
>> back
>> >> to what it was.
>> >>>>>
>> >>>>> thanks!
>> >>>>> david jencks
>> >>>>> On Oct 25, 2013, at 8:32 AM, Pierre De Rop <[email protected]>
>> >> wrote:
>> >>>>>
>> >>>>>> Hi David,
>> >>>>>>
>> >>>>>> thanks; The fix is fixing the problem :-)
>> >>>>>>
>> >>>>>> but ... there's now a new different problem: i'm now sometimes
>> >> getting this
>> >>>>>> NPE, after SCR is receiving a CM_LOCATION_CHANGED event:
>> >>>>>>
>> >>>>>> 2013-10-25 16:11:44,674 CM Event Dispatcher (Fire
>> ConfigurationEvent:
>> >>>>>> pid=sipagent) ERROR osgi  - [43] Unexpected problem delivering
>> >>>>>> configuration event to [org.osgi.service.cm.ConfigurationListener,
>> >> id=102,
>> >>>>>>
>> >>
>> bundle=341/reference:file:/home/nxuser/pp/bundles/custo/org.apache.felix.scr.jar]
>> >>>>>>
>> >>>>>> java.lang.NullPointerException
>> >>>>>>     at
>> >>>>>>
>> >>
>> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.getProperties(ComponentFactoryImpl.java:226)
>> >>>>>>     at
>> >>>>>>
>> >>
>> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.configurationUpdated(ComponentFactoryImpl.java:396)
>> >>>>>>     at
>> >>>>>>
>> >>
>> org.apache.felix.scr.impl.config.ConfigurationSupport.configurationEvent(ConfigurationSupport.java:390)
>> >>>>>>     at
>> >>>>>>
>> >>
>> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2032)
>> >>>>>>     at
>> >>>>>>
>> >>
>> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2002)
>> >>>>>>     at
>> >> org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103)
>> >>>>>>     at java.lang.Thread.run(Thread.java:722)
>> >>>>>>
>> >>>>>> Perhaps a new jira issue shall be opened ?
>> >>>>>>
>> >>>>>> I think we are getting a CM_LOCATION_CHANGED event because in our
>> >>>>>> application, we populate configuration admin by doing something
>> like
>> >> this:
>> >>>>>>
>> >>>>>> Configuration cfg = cm.getConfiguration(pid, null)
>> >>>>>> if (config.getBundleLocation() != null) {
>> >>>>>>   config.setBundleLocation(null);
>> >>>>>> }
>> >>>>>>
>> >>>>>> The setBundleLocation(null) is probably useless, but this leads to
>> a
>> >>>>>> CM_LOCATION_CHANGED event, which then sometimes ends up with the
>> NPE.
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On Friday, October 25, 2013, David Jencks <[email protected]>
>> >> wrote:
>> >>>>>>> Hi Pierre,
>> >>>>>>>
>> >>>>>>> You are so good at writing useful tests!!
>> >>>>>>>
>> >>>>>>> I found a place to call setTargets(getProperties()) from inside
>> >>>>>> ComponentFactoryImpl that would have fewer side effects.  Could you
>> >> see if
>> >>>>>> this makes your actual applications work properly?  I'm uploading a
>> >>>>>> snapshot.
>> >>>>>>>
>> >>>>>>> many thanks
>> >>>>>>> david jencks
>> >>>>>>>
>> >>>>>>> On Oct 24, 2013, at 6:17 AM, Pierre De Rop <
>> [email protected]>
>> >> wrote:
>> >>>>>>>
>> >>>>>>>> Hi David,
>> >>>>>>>>
>> >>>>>>>> Since this application is complex, I'm not able to provide logs
>> >> because
>> >>>>>>>> there are hundreds of components involved which are not mine, and
>> >> for
>> >>>>>> now,
>> >>>>>>>> I'm not able to diagnose the problem.
>> >>>>>>>>
>> >>>>>>>> But I have created FELIX-4290, and joined to it an integration
>> test
>> >> which
>> >>>>>>>> seems to reproduce the kind of problem I think I'm having in my
>> >>>>>>>> application. I also joined the proposed patch.
>> >>>>>>>>
>> >>>>>>>> I did not have time to test the patch you suggested regarding the
>> >>>>>>>> SingleComponentManager.reconfigure method, so let's continue to
>> >>>>>> investigate
>> >>>>>>>> using the jira issue and the test I attached to it.
>> >>>>>>>>
>> >>>>>>>> Thanks;
>> >>>>>>>>
>> >>>>>>>> /Pierre
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Thu, Oct 24, 2013 at 12:27 AM, David Jencks <
>> >> [email protected]
>> >>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Hi Pierre,
>> >>>>>>>>>
>> >>>>>>>>> I believe you that this code path doesn't work :-)
>> >>>>>>>>>
>> >>>>>>>>> I think there should be a less invasive way to fix this.  By any
>> >> chance
>> >>>>>>>>> can you get a debug-enabled log from when this problem occurs?
>>  It
>> >> would
>> >>>>>>>>> help confirm my suspicions of what might be missing.
>> >>>>>>>>>
>> >>>>>>>>> FWIW I suspect SingleComponentManager.reconfigure is missing a
>> >> check for
>> >>>>>>>>> m_factoryProperties here (line 561):
>> >>>>>>>>>
>> >>>>>>>>>         // nothing to do if there is no configuration (see
>> >> FELIX-714)
>> >>>>>>>>>         if ( configuration == null && m_configurationProperties
>> ==
>> >>>>>>>>> null )
>> >>>>>>>>>         {
>> >>>>>>>>>             log( LogService.LOG_DEBUG, "No configuration
>> provided
>> >> (or
>> >>>>>>>>> deleted), nothing to do", null );
>> >>>>>>>>>             return;
>> >>>>>>>>>         }
>> >>>>>>>>>
>> >>>>>>>>> Unless we can't figure anything out for sure I'd prefer to fix
>> this
>> >>>>>> before
>> >>>>>>>>> the release.
>> >>>>>>>>>
>> >>>>>>>>> thanks
>> >>>>>>>>> david jencks
>> >>>>>>>>>
>> >>>>>>>>> On Oct 23, 2013, at 3:09 PM, Pierre De Rop <
>> [email protected]
>> >>>
>> >>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> Hi David,
>> >>>>>>>>>>
>> >>>>>>>>>> (sorry to do all this noise while you are releasing ...)
>> >>>>>>>>>>
>> >>>>>>>>>> We are indeed using factory components; and today, I finally
>> >> found and
>> >>>>>>>>>> fixed a cycle, using the Apache Service Diagnostic tool; and
>> I'm
>> >> going
>> >>>>>>>>>> further on but now I'm facing another problem which I did not
>> >> have in
>> >>>>>> the
>> >>>>>>>>>> scr 1.6.2.
>> >>>>>>>>>>
>> >>>>>>>>>> So, I would like to discuss about this new problem with you
>> >> before you
>> >>>>>>>>> redo
>> >>>>>>>>>> a release, in order to decide if this problem (if there is
>> really
>> >> one
>> >>>>>> ?)
>> >>>>>>>>>> shall be addressed now or after the upcoming release ?
>> >>>>>>>>>>
>> >>>>>>>>>> So, in our application, we are extensively using factory
>> >> components
>> >>>>>>>>>> (@Component(factory=XXX")).
>> >>>>>>>>>> When we instantiate a factory component (using
>> >>>>>>>>>> ComponentFactory.newInstance()), We pass to the newInstance()
>> >> method
>> >>>>>> some
>> >>>>>>>>>> additional component properties which may also contain some
>> target
>> >>>>>>>>> filters.
>> >>>>>>>>>>
>> >>>>>>>>>> This allows to dynamically configure the filter of some
>> References
>> >>>>>>>>> declared
>> >>>>>>>>>> in the factory component.
>> >>>>>>>>>> in the scr 1.6.2, this mechanism was working fine. But using
>> >> trunk,
>> >>>>>> this
>> >>>>>>>>>> does not work all the time. Some target filters seem to be
>> >> correctly
>> >>>>>>>>>> configured, and some others are not (I'm not sure, actually,
>> it's
>> >> late
>> >>>>>>>>> ...).
>> >>>>>>>>>>
>> >>>>>>>>>> So, it looks like sometimes, some target filters are not
>> updated
>> >> before
>> >>>>>>>>>> activating components ? or factory components ?
>> >>>>>>>>>>
>> >>>>>>>>>> I'm not sure but this might be related to the old FELIX-3726.
>> >>>>>>>>>> Now, interestingly, I did the following patch and my
>> application
>> >> is now
>> >>>>>>>>>> working fine: In the  AbstractComponentManager class, I
>> >> systematically
>> >>>>>>>>>> update target filters, like this:
>> >>>>>>>>>>
>> >>>>>>>>>> +++
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>
>> >>
>> src/main/java/org/apache/felix/scr/impl/manager/AbstractComponentManager.java
>> >>>>>>>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>> >>
>>
>>
>

Reply via email to