If you can get much higher throughput from a single threaded client, then you should queue the requests in the server and process them from a single thread (or a small pool of threads).
If you can get much higher throughput from a single threaded client seems to also imply that Lucene (or at least your packaging) is NOT very concurrent (since more threads actually reduce the efficiency). -----Original Message----- From: Peter Keegan [mailto:[EMAIL PROTECTED] Sent: Friday, May 19, 2006 9:03 AM To: java-dev@lucene.apache.org; [EMAIL PROTECTED] Subject: Re: FieldsReader synchronized access vs. ThreadLocal ? The queries are mostly boolean (all AND'd terms), no. of terms varies anywhere from a few to 25 or more, with 1 or 2 sort fields. My tests are designed to measure total query throughput, not just raw search speed. The client test program blasts queries from >50 threads over a socket and runs on a separate server from Lucene. I can get much higher rates by just blasting from a single thread in the client, but this doesn't simulate the real use model. Peter On 5/19/06, Robert Engels <[EMAIL PROTECTED]> wrote: > > As an aside, > > On my VERY crappy 1.2 ghz single CPU P4, using a index of 300k documents, > I > can perform 50 searches per second (returning 20 document matches each). > This includes the time to serialize and send the results to the client > (although the client is on the same machine, but it also competes for cpu > with the search server). > > Based on some informal viewing of the CPU usage, the client consume 50-70% > of the cpu, so I would assume that moving the client off the server should > double the queries per second (although there would be additional delay > due > to network transmission). So even a single CPU P4 could easily do 100 > queries per second. > > Even though we are comparing apples and oranges, unless you are performing > some really expensive queries, I would expect your configuration to be > MUCH > faster. > > -----Original Message----- > From: Peter Keegan [mailto:[EMAIL PROTECTED] > Sent: Thursday, May 18, 2006 1:32 PM > To: java-dev@lucene.apache.org; [EMAIL PROTECTED] > Subject: Re: FieldsReader synchronized access vs. ThreadLocal ? > > > I'm returning 20 results (about .5Kb each). In fact, I had to reduce that > from 50 because the network was becoming the bottleneck. > > On the 16-cpu server, I ran tests using 8, 16 and 32 query threads, but > there was no improvement with more threads. I still believe the hardware > was > to blame. > > Peter > > On 5/18/06, Robert Engels <[EMAIL PROTECTED]> wrote: > > > > As someone else pointed out, the proposed mods will only affect queries > > the > > return a lot of Documents. If your test is only set up to return a few > > documents (or possible none at all), then you will see no difference. > > > > The fact that some of the CPUs were far less than 100%, and others were > at > > 100% may be a good sign. How any query threads were you testing with? > > > > -----Original Message----- > > From: Peter Keegan [mailto:[EMAIL PROTECTED] > > Sent: Thursday, May 18, 2006 1:01 PM > > To: java-dev@lucene.apache.org; [EMAIL PROTECTED] > > Subject: Re: FieldsReader synchronized access vs. ThreadLocal ? > > > > > > Robert, > > > > Sorry I missed your questions. > > > > The test results seem hard to believe. Doubling the CPUs only increased > > > through put by 20%??? Seems rather low for primarily a "read only" > test. > > > > > > I think this refers to the test I did on a 16 cpu (32 hyperthreaded) > > server. > > This system was actually two 8 cpu systems cabled together on their > > backplanes. I suspect that some tradeoffs were made in its design that > > allowed for this flexibility which resulted in the minimal improvement > in > > the tests. > > > > Peter did not seem to answer many of the follow-up questions (at least I > > > could not find the answers) regarding whether or not the CPU usage was > > > 100%. > > > > > > On the 16-cpu system I noticed that load was not distributed very evenly > - > > some were near 100%, others were less than 10%. On the AMD Opteron > > servers, > > the distribution was quite even and between 75-100%. > > > > I look forward to your thoughts, and others - hopefully someone can run > > the > > > test on a multiple CPU machine. > > > > > > > > I built Lucene with your mod's and ran my test on the 8 cpu AMD Linux > > server, but noticed no difference in query throughput. It would seem > that > > ThreadLocal could improve performance, but I think my bottlenecks are > > elsewhere, like IndexInput.readVInt and inserting results in priority > > queues. > > > > Peter > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]