Robert, Sorry I missed your questions.
The test results seem hard to believe. Doubling the CPUs only increased
through put by 20%??? Seems rather low for primarily a "read only" test.
I think this refers to the test I did on a 16 cpu (32 hyperthreaded) server. This system was actually two 8 cpu systems cabled together on their backplanes. I suspect that some tradeoffs were made in its design that allowed for this flexibility which resulted in the minimal improvement in the tests. Peter did not seem to answer many of the follow-up questions (at least I
could not find the answers) regarding whether or not the CPU usage was 100%.
On the 16-cpu system I noticed that load was not distributed very evenly - some were near 100%, others were less than 10%. On the AMD Opteron servers, the distribution was quite even and between 75-100%. I look forward to your thoughts, and others - hopefully someone can run the
test on a multiple CPU machine.
I built Lucene with your mod's and ran my test on the 8 cpu AMD Linux server, but noticed no difference in query throughput. It would seem that ThreadLocal could improve performance, but I think my bottlenecks are elsewhere, like IndexInput.readVInt and inserting results in priority queues. Peter