Thanks Pierre for submitting a unit test to FELIX-4866 that helped me enormously in identifying the issue.
I have fixed the bug in my code (without degrading performance) and at least your concurrency test, my concurrency tests and all the framework unit tests now consistently pass. I would be very interested in hearing whether your bigger test suit also still behaves as expected. Best regards, David On 14 May 2015 at 22:53, Pierre De Rop <pierre.de...@gmail.com> wrote: > the threadump did not help. > I will investigate (may be a bug somewhere in my part; if this is the > case, I would be sorry to make all this noise). > > hope to let you know soon. > > by the way, do you know how to run the SCR integration tests with the > framework from the trunk ? I know that there are some SCR integration tests > that are doing some load tests, and I would be interested to know if they > are also ok with the framework from the trunk ? > > cheers; > /Pierre > > > On Thu, May 14, 2015 at 10:06 PM, David Bosschaert < > david.bosscha...@gmail.com> wrote: > >> Hi Pierre, >> >> It would indeed be useful to find out more about why your test is >> hanging. Maybe analysing a threaddump might give some more >> information? >> >> Cheers, >> >> David >> >> On 14 May 2015 at 19:54, Pierre De Rop <pierre.de...@gmail.com> wrote: >> > Thanks David; I just gave a try, and indeed the parallel test passed. I >> > observed a gain of around 7/10%. The tool is described in [1]. >> > >> > But I only have 4 cores on my laptop and I will make more tests in my lab >> > at work (next week) where we have some servers having 32 or even 128 >> > processors. This will give a better idea of the gain because the more >> > processor you have, the more synchronization is costly, so I could >> possibly >> > observe a better performance gain. >> > >> > Now, I'm sorry but I think that there is still a problem (I don't know >> > where): when using more threads, the parallel test does not complete and >> > stops with a timeout message, indicating that the number of expected >> > components are not created after a timeout delay of 1 minute. >> > >> > So, I just committed a modified version of the tool in the sandbox which >> > can now take a -Dthreads option in order to configure the number of >> > threads. With -Dthreads=4, its OK. But with -Dthreads=10, then test does >> > not complete and ends with a timeout: >> > >> > $ java -Dthreads=10 -server -jar bin/felix.jar >> > >> > g! Starting benchmarks (each tested bundle will add/remove 630 components >> > during bundle activation). >> > >> > [Starting benchmarks with no processing done in components start >> > methods] >> > >> > Benchmarking bundle: >> > org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel >> > .................................................Could not start >> components >> > timely: current start latch=2, stop latch=630 >> > >> > My current understanding of this is that some components are still >> awaiting >> > for unsatisfied service dependencies, just like if a service tracker >> would >> > have missed a service registration. >> > >> > I ran the same test during two hours with the previous framework version, >> > and did not observe any problems. >> > >> > I wonder if someone else do have another tool in order to perform another >> > kind of load test, just to see if some problems are also observed. >> > >> > -> from my side, I will do the following: in the past, the benchmark >> tool >> > supported not only dependencymanager, but also Felix SCR and iPojo. So, I >> > will reintroduce Felix SCR in the benchmark and will check if I also >> > observe the problem (with -Dthreads=10). >> > >> > I will let you know. >> > >> > cheers; >> > /Pierre >> > >> > [1] >> > >> http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/README >> > >> > On Thu, May 14, 2015 at 3:41 PM, David Bosschaert < >> > david.bosscha...@gmail.com> wrote: >> > >> >> I've fixed this now in >> >> svn.apache.org/viewvc?view=revision&revision=1679367 >> >> >> >> Pierre, your loadtest now runs to completion - thanks for reporting >> >> this issue! I can see that the results for the parallel tests are a >> >> little bit different than before, but I'm not sure how to read them so >> >> I'll leave the interpretation of that to you :) >> >> >> >> Cheers, >> >> >> >> David >> >> >> >> On 14 May 2015 at 14:38, David Bosschaert <david.bosscha...@gmail.com> >> >> wrote: >> >> > I think I know what this is. I had some additional changes exactly in >> >> > this area that I simply forgot to apply this morning. I should have it >> >> > fixed sometime today. >> >> > >> >> > Cheers, >> >> > >> >> > David >> >> > >> >> > On 14 May 2015 at 14:03, David Bosschaert <david.bosscha...@gmail.com >> > >> >> wrote: >> >> >> Hi Pierre, >> >> >> >> >> >> I'll take a look today. >> >> >> >> >> >> Cheers, >> >> >> >> >> >> David >> >> >> >> >> >> On 14 May 2015 at 14:00, Pierre De Rop <pierre.de...@gmail.com> >> wrote: >> >> >>> I just committed the benchmark tool in >> >> >>> http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you >> >> can >> >> >>> take a look. >> >> >>> >> >> >>> To run the scenario: >> >> >>> >> >> >>> - install jdk8: >> >> >>> >> >> >>> [nxuser@nx0012 pderop]$ java -version >> >> >>> java version "1.8.0_40" >> >> >>> Java(TM) SE Runtime Environment (build 1.8.0_40-b26) >> >> >>> Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode) >> >> >>> >> >> >>> - checkout the loadtest from >> >> >>> http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/ >> >> >>> >> >> >>> - go the the "loadtest" directory and start the test, just like >> this: >> >> >>> >> >> >>> $ java -server -jar bin/felix.jar >> >> >>> Welcome to Apache Felix Gogo >> >> >>> >> >> >>> g! Starting benchmarks (each tested bundle will add/remove 630 >> >> components >> >> >>> during bundle activation). >> >> >>> >> >> >>> [Starting benchmarks with no processing done in components >> >> start >> >> >>> methods] >> >> >>> >> >> >>> Benchmarking bundle: >> >> >>> org.apache.felix.dependencymanager.benchmark.dependencymanager >> >> >>> .................................................. >> >> >>> -> results in nanos: [139,129,744 | 143,957,687 | 152,157,581 | >> >> 319,631,722 >> >> >>> | 919,838,078] >> >> >>> >> >> >>> Benchmarking bundle: >> >> >>> >> >> org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel >> . >> >> >>> >> >> >>> >> >> >>> Here, the first >> >> >>> "org.apache.felix.dependencymanager.benchmark.dependencymanager" >> test >> >> >>> (single-threaded) passes OK. But the next one hangs >> >> >>> >> >> >> (org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel). >> >> >>> it uses a fork join pool with size=4. >> >> >>> >> >> >>> and when typing "log warn", we see: >> >> >>> >> >> >>> "log warn" >> >> >>> >> >> >>> 2015.05.14 13:56:10 ERROR - Bundle: >> >> >>> >> >> org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel >> - >> >> >>> [ForkJoinPool-1-worker-3] Error processing tasks - >> >> >>> java.util.ConcurrentModificationException >> >> >>> at >> java.util.HashMap$HashIterator.nextNode(HashMap.java:1429) >> >> >>> at java.util.HashMap$KeyIterator.next(HashMap.java:1453) >> >> >>> at >> >> java.util.AbstractCollection.addAll(AbstractCollection.java:343) >> >> >>> at >> >> >>> >> >> >> org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:245) >> >> >>> at >> >> >>> >> >> >> org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:212) >> >> >>> at >> >> >>> >> >> >> org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:189) >> >> >>> at >> >> >>> >> >> >> org.apache.felix.framework.ServiceRegistry.getServiceReferences(ServiceRegistry.java:269) >> >> >>> at >> >> >>> >> org.apache.felix.framework.Felix.getServiceReferences(Felix.java:3577) >> >> >>> at >> >> >>> >> >> >> org.apache.felix.framework.Felix.getAllowedServiceReferences(Felix.java:3655) >> >> >>> at >> >> >>> >> >> >> org.apache.felix.framework.BundleContextImpl.getServiceReferences(BundleContextImpl.java:434) >> >> >>> at >> >> >>> >> >> >> org.apache.felix.dm.tracker.ServiceTracker.getInitialReferences(ServiceTracker.java:422) >> >> >>> at >> >> >>> >> >> org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:375) >> >> >>> at >> >> >>> >> >> org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:319) >> >> >>> at >> >> >>> >> >> org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:295) >> >> >>> at >> >> >>> >> >> >> org.apache.felix.dm.impl.ServiceDependencyImpl.start(ServiceDependencyImpl.java:226) >> >> >>> at >> >> >>> >> >> >> org.apache.felix.dm.impl.ComponentImpl.startDependencies(ComponentImpl.java:657) >> >> >>> at >> >> >>> >> >> >> org.apache.felix.dm.impl.ComponentImpl.performTransition(ComponentImpl.java:535) >> >> >>> at >> >> >>> >> >> >> org.apache.felix.dm.impl.ComponentImpl.handleChange(ComponentImpl.java:492) >> >> >>> at >> >> >>> >> org.apache.felix.dm.impl.ComponentImpl.access$5(ComponentImpl.java:482) >> >> >>> at >> >> >>> org.apache.felix.dm.impl.ComponentImpl$3.run(ComponentImpl.java:227) >> >> >>> at >> >> >>> >> >> >> org.apache.felix.dm.impl.DispatchExecutor.runTask(DispatchExecutor.java:182) >> >> >>> at >> >> >>> >> >> org.apache.felix.dm.impl.DispatchExecutor.run(DispatchExecutor.java:165) >> >> >>> at >> >> >>> >> >> >> java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) >> >> >>> at >> >> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) >> >> >>> at >> >> >>> >> >> >> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) >> >> >>> at >> >> >>> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1689) >> >> >>> at >> >> >>> >> >> >> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) >> >> >>> >> >> >>> >> >> >>> (I will investigate also in my code to check if the problem does not >> >> come >> >> >>> from me ?) >> >> >>> >> >> >>> cheers; >> >> >>> /Pierre >> >> >>> >> >> >>> >> >> >>> On Thu, May 14, 2015 at 1:47 PM, Pierre De Rop < >> pierre.de...@gmail.com >> >> > >> >> >>> wrote: >> >> >>> >> >> >>>> Hi David, >> >> >>>> >> >> >>>> I don't know if it's me (a bug in my benchmark tool) or if if there >> >> is a >> >> >>>> regression somewhere in the framework, by my parallel test does not >> >> pass >> >> >>>> anymore. >> >> >>>> >> >> >>>> The test first starts with a single-threaded scenario, which >> passes OK >> >> >>>> (org.apache.felix.dependencymanager.benchmark.dependencymanager), >> >> then when >> >> >>>> the parallel test starts >> >> >>>> >> >> >> (org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel) >> >> >>>> it suddenly hangs, and when I type "log warn" under the gogo >> shell, I >> >> see >> >> >>>> the following exception: >> >> >>>> >> >> >>>> (I'm using java8): >> >> >>>> >> >> >>>> $ java -server -Xmx4g -Xms4g -jar bin/felix.jar >> >> >>>> ____________________________ >> >> >>>> Welcome to Apache Felix Gogo >> >> >>>> >> >> >>>> Benchmarking bundle: >> >> >>>> >> >> org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel >> . >> >> >>>> >> >> >>>> (here, the dependencymanager.parallel test hangs and when I type >> "log >> >> >>>> warn", I see this:) >> >> >>>> >> >> >>>> g! log warn >> >> >>>> 2015.05.14 13:31:03 ERROR - Bundle: >> >> >>>> >> >> org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel >> - >> >> >>>> [ForkJoinPool-1-worker-3] Error processing tasks - >> >> >>>> java.util.ConcurrentModificationException >> >> >>>> at >> java.util.HashMap$HashIterator.nextNode(HashMap.java:1429) >> >> >>>> at java.util.HashMap$KeyIterator.next(HashMap.java:1453) >> >> >>>> at >> >> java.util.AbstractCollection.addAll(AbstractCollection.java:343) >> >> >>>> at >> >> >>>> >> >> >> org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:245) >> >> >>>> at >> >> >>>> >> >> >> org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:212) >> >> >>>> at >> >> >>>> >> >> >> org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:189) >> >> >>>> at >> >> >>>> >> >> >> org.apache.felix.framework.ServiceRegistry.getServiceReferences(ServiceRegistry.java:269) >> >> >>>> at >> >> >>>> >> org.apache.felix.framework.Felix.getServiceReferences(Felix.java:3577) >> >> >>>> at >> >> >>>> >> >> >> org.apache.felix.framework.Felix.getAllowedServiceReferences(Felix.java:3655) >> >> >>>> at >> >> >>>> >> >> >> org.apache.felix.framework.BundleContextImpl.getServiceReferences(BundleContextImpl.java:434) >> >> >>>> at >> >> >>>> >> >> >> org.apache.felix.dm.tracker.ServiceTracker.getInitialReferences(ServiceTracker.java:422) >> >> >>>> at >> >> >>>> >> >> org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:375) >> >> >>>> at >> >> >>>> >> >> org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:319) >> >> >>>> at >> >> >>>> >> >> org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:295) >> >> >>>> at >> >> >>>> >> >> >> org.apache.felix.dm.impl.ServiceDependencyImpl.start(ServiceDependencyImpl.java:226) >> >> >>>> at >> >> >>>> >> >> >> org.apache.felix.dm.impl.ComponentImpl.startDependencies(ComponentImpl.java:657) >> >> >>>> at >> >> >>>> >> >> >> org.apache.felix.dm.impl.ComponentImpl.performTransition(ComponentImpl.java:535) >> >> >>>> at >> >> >>>> >> >> >> org.apache.felix.dm.impl.ComponentImpl.handleChange(ComponentImpl.java:492) >> >> >>>> at >> >> >>>> >> >> org.apache.felix.dm.impl.ComponentImpl.access$5(ComponentImpl.java:482) >> >> >>>> at >> >> >>>> >> org.apache.felix.dm.impl.ComponentImpl$3.run(ComponentImpl.java:227) >> >> >>>> at >> >> >>>> >> >> >> org.apache.felix.dm.impl.DispatchExecutor.runTask(DispatchExecutor.java:182) >> >> >>>> at >> >> >>>> >> >> org.apache.felix.dm.impl.DispatchExecutor.run(DispatchExecutor.java:165) >> >> >>>> at >> >> >>>> >> >> >> java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) >> >> >>>> at >> >> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) >> >> >>>> at >> >> >>>> >> >> >> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) >> >> >>>> at >> >> >>>> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1689) >> >> >>>> at >> >> >>>> >> >> >> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) >> >> >>>> >> >> >>>> (If I configure my threadpool to 1, I have no problems, but with >> >> >>>> threadpool=4, then I have the problem) >> >> >>>> >> >> >>>> I will investigate, but Ideally, may be it would be helpful if you >> >> could >> >> >>>> also run the test by yourself; so I will commit soon something to >> >> reproduce >> >> >>>> the problem in my sandbox. >> >> >>>> >> >> >>>> cheers; >> >> >>>> /Pierre >> >> >>>> >> >> >>>> On Thu, May 14, 2015 at 11:11 AM, David Bosschaert < >> >> >>>> david.bosscha...@gmail.com> wrote: >> >> >>>> >> >> >>>>> I've committed this now in >> >> >>>>> http://svn.apache.org/viewvc?view=revision&revision=1679327 >> >> >>>>> >> >> >>>>> Curious to see what others are measuring. My tests were focused on >> >> >>>>> multiple bundles/threads obtaining the same service, as that's >> were I >> >> >>>>> saw a bit of contention. >> >> >>>>> >> >> >>>>> Cheers, >> >> >>>>> >> >> >>>>> David >> >> >>>>> >> >> >>>>> On 13 May 2015 at 15:10, Pierre De Rop <pierre.de...@gmail.com> >> >> wrote: >> >> >>>>> > Hi David, >> >> >>>>> > >> >> >>>>> > I'm looking forward to test your improvements using the >> >> >>>>> dependencymanager >> >> >>>>> > benchmark tool ([1]). >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > [1] >> >> >>>>> > >> >> >>>>> >> >> >> http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/ >> >> >>>>> > >> >> >>>>> > /Pierre >> >> >>>>> > >> >> >>>>> > On Wed, May 13, 2015 at 3:02 PM, David Bosschaert < >> >> >>>>> > david.bosscha...@gmail.com> wrote: >> >> >>>>> > >> >> >>>>> >> I have implemented the performance improvements that I was >> >> thinking of >> >> >>>>> >> using Java 5 concurrency tools, they can be viewed at [1]. >> >> >>>>> >> >> >> >>>>> >> I wrote a little performance test suite [2] that tests >> >> multithreaded >> >> >>>>> >> service registry performance (10 threads) from single / >> multiple >> >> >>>>> >> bundles with either singleton services and Prototype Service >> >> Factory >> >> >>>>> >> services and the results are quite impressive. I'm getting >> >> performance >> >> >>>>> >> improvements compared to the current trunk from 8 times better >> >> than >> >> >>>>> >> the original (800%) to more than 30 times better (3000%). >> >> >>>>> >> >> >> >>>>> >> Carsten has already reviewed the code (thanks Carsten!) and I'm >> >> >>>>> >> planning to commit it to Felix tomorrow if nobody objects. >> >> >>>>> >> >> >> >>>>> >> Cheers, >> >> >>>>> >> >> >> >>>>> >> David >> >> >>>>> >> >> >> >>>>> >> [1] >> >> >>>>> >> >> >> >>>>> >> >> >> https://github.com/bosschaert/felix/commit/e6a1b06c6e66d9c98e6d81b91ef7003c8e725450 >> >> >>>>> >> [2] >> >> >>>>> >> >> >> >>>>> >> >> >> https://github.com/bosschaert/coderthoughts/tree/master/service-registry-perftest/srperf >> >> >>>>> >> >> >> >>>>> >> On 23 March 2015 at 15:39, Richard S. Hall < >> he...@ungoverned.org> >> >> >>>>> wrote: >> >> >>>>> >> > On 3/23/15 10:17 , David Bosschaert wrote: >> >> >>>>> >> >> >> >> >>>>> >> >> On 23 March 2015 at 13:39, Richard S. Hall < >> >> he...@ungoverned.org> >> >> >>>>> >> wrote: >> >> >>>>> >> >>> >> >> >>>>> >> >>> On 3/23/15 03:55 , Guillaume Nodet wrote: >> >> >>>>> >> >>>> >> >> >>>>> >> >>>> There's a call to interrupt() in >> Felix#acquireBundleLock(), >> >> not >> >> >>>>> sure >> >> >>>>> >> if >> >> >>>>> >> >>>> it >> >> >>>>> >> >>>> can be the culprit though. >> >> >>>>> >> >>>> Interrupts could also be caused by a bundle being shutdown >> >> while >> >> >>>>> one >> >> >>>>> >> of >> >> >>>>> >> >>>> its >> >> >>>>> >> >>>> thread is waiting for a service, which should is a valid >> use >> >> case >> >> >>>>> >> imho. >> >> >>>>> >> >>>> Anyway, I think sanely reacting to a thread being >> interrupted >> >> >>>>> would be >> >> >>>>> >> >>>> good. >> >> >>>>> >> >>> >> >> >>>>> >> >>> >> >> >>>>> >> >>> Yes, threads can be interrupted if they are holding a >> bundle >> >> lock >> >> >>>>> and >> >> >>>>> >> the >> >> >>>>> >> >>> global lock holder needs the bundle lock. >> >> >>>>> >> >>> >> >> >>>>> >> >>> I admit that I do not recall why we ignore the interrupt >> >> here, but >> >> >>>>> >> didn't >> >> >>>>> >> >>> we >> >> >>>>> >> >>> implement service lookup so that a bundle lock wasn't >> >> necessary? I >> >> >>>>> >> >>> thought >> >> >>>>> >> >>> we just checked for the validity of the bundle context >> before >> >> >>>>> returning >> >> >>>>> >> >>> or >> >> >>>>> >> >>> something. Perhaps we felt there was no reason to be >> >> interrupted in >> >> >>>>> >> that >> >> >>>>> >> >>> case. I really don't know. >> >> >>>>> >> >> >> >> >>>>> >> >> I think that the Service Registry could be rewritten to be >> >> >>>>> completely >> >> >>>>> >> >> free of synchronized blocks using the Java 5 concurrency >> >> libraries, >> >> >>>>> >> > >> >> >>>>> >> > >> >> >>>>> >> > Well, that just moves the sync blocks to the library, but >> yeah >> >> sure. >> >> >>>>> >> > >> >> >>>>> >> >> which I think would really be a better approach. There is >> too >> >> much >> >> >>>>> >> >> locking going on in the current SR implementation IMHO. >> >> >>>>> >> > >> >> >>>>> >> > >> >> >>>>> >> > I don't really think there is too much, but it is >> complicated. >> >> >>>>> >> > Unfortunately, it is complicated to make sure that locks >> aren't >> >> held >> >> >>>>> >> while >> >> >>>>> >> > do service lookups and this is complicated because you can >> run >> >> into >> >> >>>>> >> cycles, >> >> >>>>> >> > etc. >> >> >>>>> >> > >> >> >>>>> >> > But feel free to try to simplify it. >> >> >>>>> >> > >> >> >>>>> >> >> >> >> >>>>> >> >> This brings the question: can we move to Java 5 (or Java 6) >> >> for the >> >> >>>>> >> >> Framework codebase? AFAIK we're currently still JDK 1.4 >> >> compatible >> >> >>>>> but >> >> >>>>> >> >> I would be surprised if there is anyone who still needs a >> JDK >> >> that >> >> >>>>> >> >> went end-of-life 7 years ago. >> >> >>>>> >> > >> >> >>>>> >> > >> >> >>>>> >> > At this point, it doesn't really matter to me. >> >> >>>>> >> > >> >> >>>>> >> > -> richard >> >> >>>>> >> > >> >> >>>>> >> >> >> >> >>>>> >> >> Best regards, >> >> >>>>> >> >> >> >> >>>>> >> >> David >> >> >>>>> >> > >> >> >>>>> >> > >> >> >>>>> >> >> >> >>>>> >> >> >>>> >> >> >>>> >> >> >>