Hi, Relevance Judgments are labor intensive and expensive. Some Information Retrieval forums ( TREC, CLEF, etc) provide these golden sets. But they are not public.
http://rosenfeldmedia.com/books/search-analytics/ talks about how to create a "golden set" for your top n queries. Also there are some works describing how to tune parameters of search system using click trough data. On Thursday, June 12, 2014 8:47 PM, Ivan Brusic <i...@brusic.com> wrote: Perhaps more of an NLP question, but are there any tests regarding relevance for Lucene? Given an example corpus of documents, what are the golden sets for specific queries? The Wikidump dump is used as a benchmarking tool for both indexing and querying in Lucene, but there are no metrics in terms of precision. The Open Relevance project was closed yesterday ( http://lucene.apache.org/openrelevance/), which is what prompted me to ask this question. Was the sub-project closed because others have found alternate solutions? Relevancy is of course extremely context-dependent and objective, but my hope is that there is an example catalog somewhere with defined golden sets. Cheers, Ivan --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org