Re: search quality - assessment & improvements

Grant Ingersoll Mon, 25 Jun 2007 11:57:03 -0700


On Jun 25, 2007, at 2:04 PM, Doug Cutting wrote:

Doron Cohen wrote:
It is very important that we would be able to assess the searchquality ina repeatable manner - so that anyone can repeat the quality tests,and
maybe find ways to improve them. (This would also allow to verify the
"improvements claims" above...). This capability seems like anatural partof the benchmark package. I started to look at extending thebenchmarkpackage with search quality module, that would open an index (orfirstcreate one), run a set of queries (similar to the performancebenchmark),and compute and report the set of known statistics mentioned aboveand
more. Such a module depends on input data - documents, queries, and
judgements. And that's my second question. We don't have to inventthisdata - TREC has it already, and it is getting wider every year asthere are
more judgements. So, theoretically we could use TREC data.
We should be careful not to tune things too much for any oneapplication and/or dataset. Tools to perform evaluation wouldclearly be valuable. But changes that improve Lucene's results onTREC data may or may not be of general utility. The best way totune an application is to sample its query stream and evaluatethese against its documents.

+1. To do this, we could use Reuters or Wikipedia. The hard part isgenerating the queries and having people make relevance judgments fora sufficient sample size. Over time it would get better, especiallyif we had a nice way for people to add queries/judgments w/o goingthrough the patch/commit process (maybe a page on the wiki could holdthe queries and judgments? That could get tricky) we might get moresupport from outsiders.

That said, Lucene's scoring method has never been systematicallytuned, and some judicious tuning based on TREC results wouldprobably benefit a majority of Lucene applications. Ideally we candevelop evaluation tools, use them on a variety of datasets to findbetter defaults for Lucene, and make the tools available so thatfolks can fine-tune things for their particular applications.


+1 as well.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: search quality - assessment & improvements

Reply via email to