Re: IndexOptimizer (Re: Lucene performance bottlenecks)

Doug Cutting Thu, 15 Dec 2005 08:49:04 -0800

Andrzej Bialecki wrote:

 . How were the queries generated?  From a log or randomly?
Queries have been picked up manually, to test the worst performing casesfrom a real query log.

So, for example, the 50% error rate might not be typical, but could beworst-case.

 . When results differed greatly, did they look a lot worse?


Yes. E.g. see the differences for MAX_HITS=10000

The graph just shows that they differ, not how much better or worse theyare, since the baseline is not perfect. When the top-10 is 50%different, are those 5 different hits markedly worse matches to your eyethan the five they've displaced, or are they comparable? That's whatreally matters.

I actually forgot to write that I don't use any of Nutch code. Early onI decided to eliminate this part in order to get first the rawperformance from Lucene - but still using the Lucene queriescorresponding to translated Nutch queries.

What part of Nutch are you trying to avoid? Perhaps you could trymeasuring your Lucene-only benchmark against a Nutch-based one. If theydon't differ markedly then you can simply use Nutch, which makes it astronger benchmark. If they differ, then we should figure out why.

In several installations I use smaller values of slop (around 20-40).But this is motivated by better quality matches, not by performance, soI didn't test for this...

But that's a great reason to test for it! If lower slop can improveresult quality, then we should certainly see if it also makesoptimizations easier.


Doug

Re: IndexOptimizer (Re: Lucene performance bottlenecks)

Reply via email to