Doron Cohen wrote:
It is very important that we would be able to assess the search quality in
a repeatable manner - so that anyone can repeat the quality tests, and
maybe find ways to improve them. (This would also allow to verify the
"improvements claims" above...). This capability seems like a natural part
of the benchmark package. I started to look at extending the benchmark
package with search quality module, that would open an index (or first
create one), run a set of queries (similar to the performance benchmark),
and compute and report the set of known statistics mentioned above and
more. Such a module depends on input data - documents, queries, and
judgements. And that's my second question. We don't have to invent this
data - TREC has it already, and it is getting wider every year as there are
more judgements. So, theoretically we could use TREC data.

We should be careful not to tune things too much for any one application and/or dataset. Tools to perform evaluation would clearly be valuable. But changes that improve Lucene's results on TREC data may or may not be of general utility. The best way to tune an application is to sample its query stream and evaluate these against its documents.

That said, Lucene's scoring method has never been systematically tuned, and some judicious tuning based on TREC results would probably benefit a majority of Lucene applications. Ideally we can develop evaluation tools, use them on a variety of datasets to find better defaults for Lucene, and make the tools available so that folks can fine-tune things for their particular applications.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to