Given that Lucene is generally VERY CPU bound, having stalled processors implies that those threads (and more) are blocked, either by IO, or by a synchronized block - as long as you have more threads than processors.
If the machine has a POOR disk subsystem in comparison to the CPU speed, and the OS disk cache is too small, you can easily stall the threads. -----Original Message----- From: Peter Keegan [mailto:[EMAIL PROTECTED] Sent: Thursday, May 18, 2006 1:32 PM To: java-dev@lucene.apache.org; [EMAIL PROTECTED] Subject: Re: FieldsReader synchronized access vs. ThreadLocal ? I'm returning 20 results (about .5Kb each). In fact, I had to reduce that from 50 because the network was becoming the bottleneck. On the 16-cpu server, I ran tests using 8, 16 and 32 query threads, but there was no improvement with more threads. I still believe the hardware was to blame. Peter On 5/18/06, Robert Engels <[EMAIL PROTECTED]> wrote: > > As someone else pointed out, the proposed mods will only affect queries > the > return a lot of Documents. If your test is only set up to return a few > documents (or possible none at all), then you will see no difference. > > The fact that some of the CPUs were far less than 100%, and others were at > 100% may be a good sign. How any query threads were you testing with? > > -----Original Message----- > From: Peter Keegan [mailto:[EMAIL PROTECTED] > Sent: Thursday, May 18, 2006 1:01 PM > To: java-dev@lucene.apache.org; [EMAIL PROTECTED] > Subject: Re: FieldsReader synchronized access vs. ThreadLocal ? > > > Robert, > > Sorry I missed your questions. > > The test results seem hard to believe. Doubling the CPUs only increased > > through put by 20%??? Seems rather low for primarily a "read only" test. > > > I think this refers to the test I did on a 16 cpu (32 hyperthreaded) > server. > This system was actually two 8 cpu systems cabled together on their > backplanes. I suspect that some tradeoffs were made in its design that > allowed for this flexibility which resulted in the minimal improvement in > the tests. > > Peter did not seem to answer many of the follow-up questions (at least I > > could not find the answers) regarding whether or not the CPU usage was > > 100%. > > > On the 16-cpu system I noticed that load was not distributed very evenly - > some were near 100%, others were less than 10%. On the AMD Opteron > servers, > the distribution was quite even and between 75-100%. > > I look forward to your thoughts, and others - hopefully someone can run > the > > test on a multiple CPU machine. > > > > > I built Lucene with your mod's and ran my test on the 8 cpu AMD Linux > server, but noticed no difference in query throughput. It would seem that > ThreadLocal could improve performance, but I think my bottlenecks are > elsewhere, like IndexInput.readVInt and inserting results in priority > queues. > > Peter > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]