Re: [jira] Created: (NUTCH-50) Benchmarks & Performance goals

Michael Nebel Fri, 29 Apr 2005 12:25:52 -0700

Hi Andrzej,

thanks for your corrections. I simply tried to my express my observations.

This is slightly incorrect. The summaries are only accessed for the first page of results, not for all hits. So, no matter how many hits there are, only the currently displayed page needs the summaries.


you're right! I forgot to add: 10 results per page, 1 result per site(!).

So I would suggest to use a static set of queries and an identical set of segments to generate the numbers.
If you repeat the same query twice, of course the results will come back faster, because the relevant data will be loaded into the OS disk cache.

ok - between two runs you'll have to take care of this point. But I think, after 1000 different queries, the OS cache will be of no use any more, or?

Related to this, it is also better to use a single merged Lucene index than many per-segment indexes - the latter will work as well, but performance will be lower, and also there might be weird problems with scoring.


without the merged index my system would not answer at all!

But back to my numbers: beside of the test, I used sar (out of the systat-package under linux) to measure the system parameters. I see massive disk-i/os at the data-disk (not the swapping-partition!). I think, this might be one bottleneck, which we should look at more closely.

Regards

        Michael

Re: [jira] Created: (NUTCH-50) Benchmarks & Performance goals

Reply via email to