yes David, consistently. but I don't see stacktraces.
Now, looking at the code, I see that this stacktrace is logged in DEBUG:
final void dumpThreads()
{
try
{
String dump = new ThreadDump().call();
log( LogService.LOG_DEBUG, dump, null );
So, I will activate DEBUG and provide the stacktrace.
But is this a bug to log this stacktrace in DEBUG ?
I mean: should it be logged in WARN, instead of DEBUG ?
/pierre
On Tue, Oct 29, 2013 at 4:28 PM, David Jencks <[email protected]>wrote:
> Hi Pierre,
>
> I think there should be a thread dump in the log to go with this message.
> If there is could you send it to me or attach it to FELIX-4297?
>
> Does this happen consistently?
>
> many thanks
> david jencks
>
> On Oct 29, 2013, at 3:43 AM, Pierre De Rop <[email protected]> wrote:
>
> > Hi David;
> >
> > 1) Every integration tests are passing OK.
> >
> > Regarding the OOM, I believe that the out of memory is likely to be cause
> > by the following load tests, which are running in DEBUG mode:
> >
> >
> src/test/java/org/apache/felix/scr/integration/ComponentConcurrencyTest.java
> > src/test/java/org/apache/felix/scr/integration/Felix3680Test.java
> > src/test/java/org/apache/felix/scr/integration/Felix3680_2Test.java
> >
> >
> > So, I just added this in each files (in static initializers):
> >
> > DS_LOGLEVEL = "warn";
> >
> > And I have no more OOM, with this nice test result message:
> >
> > Tests run: 103, Failures: 0, Errors: 0, Skipped: 0
> > [INFO] BUILD SUCCESS
> >
> >
> >
> > 2) Now, I did again a non regression tests with our applications, and I'm
> > seeing a new WARN message (but it does not seem to prevent the
> application
> > to be correctly initialized: all components are in the end satisfied):
> >
> > (if you believe that it's now time to release, then you can go ahead, and
> > we'll then investigate this new issue after the release ... I let you
> > decide)
> >
> > ->
> >
> > 2013-10-29 11:10:02,487 CM Event Dispatcher (Fire ConfigurationEvent:
> > pid=com.alcatel.as.ioh.impl.server.ServerFactoryImpl) ERROR osgi -
> > [com.alcatel.as.ioh.impl.server.ServerFactoryImpl(91)] DependencyManager
> :
> > invokeBindMethod : timeout on open latch newtcpprocessor
> >
> >
> >
> >
> > kind regards;
> > /Pierre
> >
> >
> > On Tue, Oct 29, 2013 at 12:13 AM, David Jencks <[email protected]
> >wrote:
> >
> >> Pierre,
> >>
> >> I opened https://issues.apache.org/jira/browse/FELIX-4297 and fixed the
> >> problems I found (for 2). I don't see the OOM often enough to have any
> >> confidence that anything I do would actually fix it, so I'm inclined to
> do
> >> nothing. Is that OK with you?
> >>
> >> Unless you can find some more problems :-) I'm planning to try another
> >> release when the config admin 1.8 gets to maven central. I'm going to
> >> update the pom to normally run against the CA 1.8 version supporting R5
> and
> >> change the profile so running against R4 requires specifying profiles
> >> explicitly.
> >>
> >> thanks again!
> >> david jencks
> >>
> >> On Oct 28, 2013, at 12:24 AM, David Jencks <[email protected]>
> wrote:
> >>
> >>> Hi Pierre,
> >>>
> >>> Much better to find these problems before a release than just after!
> >>>
> >>> I saw an OOM once recently but haven't been able to reproduce it.
> >>>
> >>> I'm looking into the NPE. I think I see the timing hole it is using
> but
> >> need to think about it some more.
> >>>
> >>> many thanks!
> >>> david jencks
> >>>
> >>> On Oct 27, 2013, at 2:58 AM, Pierre De Rop <[email protected]>
> >> wrote:
> >>>
> >>>> Hi David,
> >>>>
> >>>> Looking at our configurator component we are currently using (but we
> >> will fix it in order to use the multi-location "?"), I see this:
> >>>>
> >>>> void configure(String pid, Dictionary pidConf) {
> >>>> Configuration config = getConfiguration(_pid, null);
> >>>> if (config.getBundleLocation() != null) {
> >>>> config.setBundleLocation(null);
> >>>> }
> >>>> config.update(pidConf);
> >>>> }
> >>>>
> >>>> So I believe that you are getting a null configuration because there
> is
> >> a short window between the setBundleLocation(null) (at this point, the
> >> configuration is null) and the config.update(pidConf) call ...
> >>>>
> >>>> So, the good news is that I'm not having anymore some NPE using your
> >> latest commits :-) and I think our application is now fully operational.
> >>>>
> >>>> but ... (please don't start to abominate me ) now, in order to do a
> >> final check, I restarted the integration tests and there is still two
> >> problems:
> >>>>
> >>>> 1) I'm sometimes getting some out of memory errors: this is probably
> >> caused by the ComponentConcurrencyTest/Felix3680Test tests, which are
> >> currently configured in DEBUG mode ?
> >>>>
> >>>> 2) I ran the tests two times, and the second time, I got this
> exception
> >> with the failing
> >>>> Felix3680_2Test:
> >>>>
> >>>>
> >>
> test_concurrent_injection_with_bundleContext(org.apache.felix.scr.integration.Felix3680_2Test)
> >> Time elapsed: 36.597 sec <<< ERROR!
> >>>> java.lang.NullPointerException
> >>>> at
> >>
> org.apache.felix.scr.impl.manager.DependencyManager.invokeUnbindMethod(DependencyManager.java:1710)
> >>>> at
> >>
> org.apache.felix.scr.impl.manager.SingleComponentManager.invokeUnbindMethod(SingleComponentManager.java:387)
> >>>> at
> >>
> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.removedService(DependencyManager.java:355)
> >>>> at
> >>
> org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.removedService(DependencyManager.java:290)
> >>>> at
> >>
> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1503)
> >>>> at
> >>
> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerRemoved(ServiceTracker.java:1398)
> >>>> at
> >>
> org.apache.felix.scr.impl.manager.ServiceTracker$AbstractTracked.untrack(ServiceTracker.java:1258)
> >>>> at
> >>
> org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1437)
> >>>> at
> >>
> org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:932)
> >>>> at
> >>
> org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:793)
> >>>> at
> >>
> org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:543)
> >>>> at
> >> org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4260)
> >>>> at org.apache.felix.framework.Felix.access$000(Felix.java:74)
> >>>> at
> >> org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:390)
> >>>> at
> >>
> org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:148)
> >>>> at
> >>
> org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:127)
> >>>> at
> >>
> org.apache.felix.scr.integration.components.felix3680_2.Main$RegistrationHelper$2.run(Main.java:136)
> >>>> at
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >>>> at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>>> at java.lang.Thread.run(Thread.java:722)
> >>>>
> >>>> Are you also getting this exception ?
> >>>>
> >>>> thanks
> >>>>
> >>>> /Pierre
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Sat, Oct 26, 2013 at 6:34 PM, David Jencks <[email protected]
> >
> >> wrote:
> >>>> Hi PIerre,
> >>>>
> >>>> Looking at the CA spec it looks like CA is supposed to send out
> >> CM_LOCATION_CHANGED events even before any properties are set when
> >> setBundleLocation is called. I added some code to ignore these events.
> >> Note that DS is "reserving" the configurations for (one of) the
> >> component(s) that will be consuming them by calling
> getConfiguration(pid).
> >>>>
> >>>> I do wonder how the location to something non-null on your
> >> configurations before the properties are set.
> >>>>
> >>>> Waiting for the next bug :-)
> >>>>
> >>>> thanks
> >>>> david jencks
> >>>>
> >>>> On Oct 26, 2013, at 3:00 AM, Pierre De Rop <[email protected]>
> >> wrote:
> >>>>
> >>>>> Hello David,
> >>>>>
> >>>>> The code we are using to configure our components is old, at at the
> >> time we wrote it, configadmin was not supporting multi-location. But I
> do
> >> agree, we can now use the "?" multi-location.
> >>>>>
> >>>>> Now, I'm sorry but I'm still seeing another NPE (sometimes, not
> >> always):
> >>>>>
> >>>>> 2013-10-26 11:45:44,209 CM Event Dispatcher (Fire ConfigurationEvent:
> >> pid=sipagent) ERROR osgi - [43] Unexpected problem delivering
> >> configuration event to [org.osgi.service.cm.ConfigurationListener,
> id=102,
> >>
> bundle=341/reference:file:/home/nxuser/pp/bundles/custo/org.apache.felix.scr.jar]
> >>>>>
> >>>>> java.lang.NullPointerException
> >>>>> at
> >>
> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.getProperties(ComponentFactoryImpl.java:226)
> >>>>> at
> >>
> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.configurationUpdated(ComponentFactoryImpl.java:396)
> >>>>> at
> >>
> org.apache.felix.scr.impl.config.ConfigurationSupport.configurationEvent(ConfigurationSupport.java:344)
> >>>>> at
> >>
> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2032)
> >>>>> at
> >>
> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2002)
> >>>>> at
> >> org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103)
> >>>>> at java.lang.Thread.run(Thread.java:722)
> >>>>>
> >>>>>
> >>>>> I'm not sure, but it seems that ConfigAdmin is providing a null
> >> dictionary, when delivering a CM_LOCATION_CHANGED event ? if correct,
> then
> >> Is this a normal behavior ?
> >>>>>
> >>>>> This is strange; perhaps I shall start a new integration test ?
> >>>>>
> >>>>> /Pierre
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sat, Oct 26, 2013 at 9:54 AM, David Jencks <
> [email protected]>
> >> wrote:
> >>>>> Hi Pierre,
> >>>>>
> >>>>> This pointed out a logic error I introduced for Felix 3651. I opened
> >> https://issues.apache.org/jira/browse/FELIX-4293 and fixed the error I
> >> found which I think explains the NPE. Could you check this?
> >>>>>
> >>>>> Could I ask what you are trying to do by setting the bundleLocation
> to
> >> null? If you want to allow any bundle to receive the configuration you
> >> could use multi-location support and set the location to "?" With the
> code
> >> you have now, if the configuration is already in use by a DS component,
> the
> >> location changed event will result in the bundle location being reset
> back
> >> to what it was.
> >>>>>
> >>>>> thanks!
> >>>>> david jencks
> >>>>> On Oct 25, 2013, at 8:32 AM, Pierre De Rop <[email protected]>
> >> wrote:
> >>>>>
> >>>>>> Hi David,
> >>>>>>
> >>>>>> thanks; The fix is fixing the problem :-)
> >>>>>>
> >>>>>> but ... there's now a new different problem: i'm now sometimes
> >> getting this
> >>>>>> NPE, after SCR is receiving a CM_LOCATION_CHANGED event:
> >>>>>>
> >>>>>> 2013-10-25 16:11:44,674 CM Event Dispatcher (Fire
> ConfigurationEvent:
> >>>>>> pid=sipagent) ERROR osgi - [43] Unexpected problem delivering
> >>>>>> configuration event to [org.osgi.service.cm.ConfigurationListener,
> >> id=102,
> >>>>>>
> >>
> bundle=341/reference:file:/home/nxuser/pp/bundles/custo/org.apache.felix.scr.jar]
> >>>>>>
> >>>>>> java.lang.NullPointerException
> >>>>>> at
> >>>>>>
> >>
> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.getProperties(ComponentFactoryImpl.java:226)
> >>>>>> at
> >>>>>>
> >>
> org.apache.felix.scr.impl.manager.ComponentFactoryImpl.configurationUpdated(ComponentFactoryImpl.java:396)
> >>>>>> at
> >>>>>>
> >>
> org.apache.felix.scr.impl.config.ConfigurationSupport.configurationEvent(ConfigurationSupport.java:390)
> >>>>>> at
> >>>>>>
> >>
> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2032)
> >>>>>> at
> >>>>>>
> >>
> org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2002)
> >>>>>> at
> >> org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103)
> >>>>>> at java.lang.Thread.run(Thread.java:722)
> >>>>>>
> >>>>>> Perhaps a new jira issue shall be opened ?
> >>>>>>
> >>>>>> I think we are getting a CM_LOCATION_CHANGED event because in our
> >>>>>> application, we populate configuration admin by doing something like
> >> this:
> >>>>>>
> >>>>>> Configuration cfg = cm.getConfiguration(pid, null)
> >>>>>> if (config.getBundleLocation() != null) {
> >>>>>> config.setBundleLocation(null);
> >>>>>> }
> >>>>>>
> >>>>>> The setBundleLocation(null) is probably useless, but this leads to a
> >>>>>> CM_LOCATION_CHANGED event, which then sometimes ends up with the
> NPE.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Friday, October 25, 2013, David Jencks <[email protected]>
> >> wrote:
> >>>>>>> Hi Pierre,
> >>>>>>>
> >>>>>>> You are so good at writing useful tests!!
> >>>>>>>
> >>>>>>> I found a place to call setTargets(getProperties()) from inside
> >>>>>> ComponentFactoryImpl that would have fewer side effects. Could you
> >> see if
> >>>>>> this makes your actual applications work properly? I'm uploading a
> >>>>>> snapshot.
> >>>>>>>
> >>>>>>> many thanks
> >>>>>>> david jencks
> >>>>>>>
> >>>>>>> On Oct 24, 2013, at 6:17 AM, Pierre De Rop <[email protected]
> >
> >> wrote:
> >>>>>>>
> >>>>>>>> Hi David,
> >>>>>>>>
> >>>>>>>> Since this application is complex, I'm not able to provide logs
> >> because
> >>>>>>>> there are hundreds of components involved which are not mine, and
> >> for
> >>>>>> now,
> >>>>>>>> I'm not able to diagnose the problem.
> >>>>>>>>
> >>>>>>>> But I have created FELIX-4290, and joined to it an integration
> test
> >> which
> >>>>>>>> seems to reproduce the kind of problem I think I'm having in my
> >>>>>>>> application. I also joined the proposed patch.
> >>>>>>>>
> >>>>>>>> I did not have time to test the patch you suggested regarding the
> >>>>>>>> SingleComponentManager.reconfigure method, so let's continue to
> >>>>>> investigate
> >>>>>>>> using the jira issue and the test I attached to it.
> >>>>>>>>
> >>>>>>>> Thanks;
> >>>>>>>>
> >>>>>>>> /Pierre
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Thu, Oct 24, 2013 at 12:27 AM, David Jencks <
> >> [email protected]
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Pierre,
> >>>>>>>>>
> >>>>>>>>> I believe you that this code path doesn't work :-)
> >>>>>>>>>
> >>>>>>>>> I think there should be a less invasive way to fix this. By any
> >> chance
> >>>>>>>>> can you get a debug-enabled log from when this problem occurs?
> It
> >> would
> >>>>>>>>> help confirm my suspicions of what might be missing.
> >>>>>>>>>
> >>>>>>>>> FWIW I suspect SingleComponentManager.reconfigure is missing a
> >> check for
> >>>>>>>>> m_factoryProperties here (line 561):
> >>>>>>>>>
> >>>>>>>>> // nothing to do if there is no configuration (see
> >> FELIX-714)
> >>>>>>>>> if ( configuration == null && m_configurationProperties
> ==
> >>>>>>>>> null )
> >>>>>>>>> {
> >>>>>>>>> log( LogService.LOG_DEBUG, "No configuration provided
> >> (or
> >>>>>>>>> deleted), nothing to do", null );
> >>>>>>>>> return;
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> Unless we can't figure anything out for sure I'd prefer to fix
> this
> >>>>>> before
> >>>>>>>>> the release.
> >>>>>>>>>
> >>>>>>>>> thanks
> >>>>>>>>> david jencks
> >>>>>>>>>
> >>>>>>>>> On Oct 23, 2013, at 3:09 PM, Pierre De Rop <
> [email protected]
> >>>
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi David,
> >>>>>>>>>>
> >>>>>>>>>> (sorry to do all this noise while you are releasing ...)
> >>>>>>>>>>
> >>>>>>>>>> We are indeed using factory components; and today, I finally
> >> found and
> >>>>>>>>>> fixed a cycle, using the Apache Service Diagnostic tool; and I'm
> >> going
> >>>>>>>>>> further on but now I'm facing another problem which I did not
> >> have in
> >>>>>> the
> >>>>>>>>>> scr 1.6.2.
> >>>>>>>>>>
> >>>>>>>>>> So, I would like to discuss about this new problem with you
> >> before you
> >>>>>>>>> redo
> >>>>>>>>>> a release, in order to decide if this problem (if there is
> really
> >> one
> >>>>>> ?)
> >>>>>>>>>> shall be addressed now or after the upcoming release ?
> >>>>>>>>>>
> >>>>>>>>>> So, in our application, we are extensively using factory
> >> components
> >>>>>>>>>> (@Component(factory=XXX")).
> >>>>>>>>>> When we instantiate a factory component (using
> >>>>>>>>>> ComponentFactory.newInstance()), We pass to the newInstance()
> >> method
> >>>>>> some
> >>>>>>>>>> additional component properties which may also contain some
> target
> >>>>>>>>> filters.
> >>>>>>>>>>
> >>>>>>>>>> This allows to dynamically configure the filter of some
> References
> >>>>>>>>> declared
> >>>>>>>>>> in the factory component.
> >>>>>>>>>> in the scr 1.6.2, this mechanism was working fine. But using
> >> trunk,
> >>>>>> this
> >>>>>>>>>> does not work all the time. Some target filters seem to be
> >> correctly
> >>>>>>>>>> configured, and some others are not (I'm not sure, actually,
> it's
> >> late
> >>>>>>>>> ...).
> >>>>>>>>>>
> >>>>>>>>>> So, it looks like sometimes, some target filters are not updated
> >> before
> >>>>>>>>>> activating components ? or factory components ?
> >>>>>>>>>>
> >>>>>>>>>> I'm not sure but this might be related to the old FELIX-3726.
> >>>>>>>>>> Now, interestingly, I did the following patch and my application
> >> is now
> >>>>>>>>>> working fine: In the AbstractComponentManager class, I
> >> systematically
> >>>>>>>>>> update target filters, like this:
> >>>>>>>>>>
> >>>>>>>>>> +++
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>
> src/main/java/org/apache/felix/scr/impl/manager/AbstractComponentManager.java
> >>>>>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
>
>