Re: Performance of never optimizing

Justus Pendleton Tue, 04 Nov 2008 22:05:10 -0800

On 05/11/2008, at 4:36 AM, Michael McCandless wrote:

If possible, you should try to use a larger corpus (eg Wikipedia)rather than multiply Reuters by N, which creates unnatural termfrequency distribution.

I'll replicate the tests with the wikipedia corpus over the next fewdays and regenerate the graphs to show the data points in addition tothe curves. The data I am using comes from the output on the benchmarkframework:

[java] Operation round mrg runCntrecsPerRun rec/s elapsedSec avgUsedMem avgTotalMem[java] UnoptSearch_100_Par 0 21 100 230.4 0.4329,517,680 44,834,816

I am plotting the "rec/s" which I am (possibly mistakenly)interpreting to mean "searches per second" as I asked for 100 searchesand it took 0.43 seconds to perform them all.

It's best to use a real query log, if possible, to run the queries.If you are expecting your production machines to have plenty of RAMto hold the index, then you should definitely run the queriesthrough once, discard it, to get all stuff loaded in RAM includingthe OS caching all required data in its IO cache.
Not opening/closing a reader per search should change the graphsquite a bit (for the better) and hopefully change some of the oddthings you are seeing (in the questions below).

I don't believe our large users to have enough memory for Luceneindexes to fit in RAM. (Especially given we use quite a bit of RAM forother stuff.) I think we also close readers pretty frequently(whenever any user updates a JIRA issue, which I am assuming happeningnearly constantly when you've got thousands of users). I was trying tomimic our usage as closely as I could to see whether Lucene behavespathologically poorly given our current architecture. There have beensome excellent suggestions about using in-memory indexes for recentupdates but changes of that kind are, unfortunately, currently outsideof my purview :-(

Given that our current usage may be suboptimal :-/ does anyone haveany ideas about what may be causing the anomalies I identified earlier?


Cheers,
Justus

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Performance of never optimizing

Reply via email to