Re: search quality - assessment & improvements

Doug Cutting Mon, 25 Jun 2007 11:05:05 -0700

Doron Cohen wrote:

It is very important that we would be able to assess the search quality in
a repeatable manner - so that anyone can repeat the quality tests, and
maybe find ways to improve them. (This would also allow to verify the
"improvements claims" above...). This capability seems like a natural part
of the benchmark package. I started to look at extending the benchmark
package with search quality module, that would open an index (or first
create one), run a set of queries (similar to the performance benchmark),
and compute and report the set of known statistics mentioned above and
more. Such a module depends on input data - documents, queries, and
judgements. And that's my second question. We don't have to invent this
data - TREC has it already, and it is getting wider every year as there are
more judgements. So, theoretically we could use TREC data.

We should be careful not to tune things too much for any one applicationand/or dataset. Tools to perform evaluation would clearly be valuable.But changes that improve Lucene's results on TREC data may or may notbe of general utility. The best way to tune an application is to sampleits query stream and evaluate these against its documents.

That said, Lucene's scoring method has never been systematically tuned,and some judicious tuning based on TREC results would probably benefit amajority of Lucene applications. Ideally we can develop evaluationtools, use them on a variety of datasets to find better defaults forLucene, and make the tools available so that folks can fine-tune thingsfor their particular applications.


Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: search quality - assessment & improvements

Reply via email to