I think this is a great idea and would be happy to play the game. Re. the collection, there is some benefit to TREC if somebody is going to do formal recall and precision computations, otherwise it doesn't matter much. The best Similarity for any collection is likely to be specific to the collection, so if the point here is to pick the best DefaultSimilarity, the collection should be as representative of Lucene users' content as possible (I know this is probably impossible to achieve).
One possible danger in these kinds of bake-offs is that people who know the content will likely craft specific queries that are not reflective of real users. It would be good to at least have a standard set of queries that was tested against each implementation. Perhaps each person could contribute a set of test queries in addition to their Similarity and the combined query set could be tested against each. Finally, I'd suggest picking content that has multiple fields and allow the individual implementations to decide how to search these fields -- just title and body would be enough. I would like to use my MaxDisjunctionQuery and see how it compares to other approaches (e.g., the default MultiFieldQueryParser, assuming somebody uses that in this test). Chuck > -----Original Message----- > From: Doug Cutting [mailto:[EMAIL PROTECTED] > Sent: Friday, December 17, 2004 1:27 PM > To: Lucene Developers List > Subject: DefaultSimilarity 2.0? > > Chuck Williams wrote: > > Another issue will likely be the tf() and idf() computations. I have > a > > similar desired relevance ranking and was not getting what I wanted > due > > to the idf() term dominating the score. [ ... ] > > Chuck has made a series of criticisms of the DefaultSimilarity > implementation. Unfortunately it is difficult to quickly evaluate > these, as it requires relevance judgements. But, still, we should > consider modifying DefaultSimilarity for the 2.0 release if there are > easy improvements to be had. But how do we decide what's better? > > Perhaps we should perform a formal or semi-formal evaluation of various > Similarity implementations on a reference collection. For example, for > a formal evalution we might use one the TREC Web collections, which have > associated queries and relevance judgements. Or, less formally, we > could use a crawl of the ~5M pages in DMOZ (I would be glad to collect > these using Nutch). > > This could work as follows: > -- Different folks could download and index a reference collection, > offering demonstration search systems. We would provide complete code. > These would differ only in their Similarity implementation. All > implementations would use the same Analyzer and search only a single > field. > -- These folks could then announce their candiate implementations and > let others run queries against them, via HTTP. Different Similarity > implementations could thus be publicly and interactively compared. > -- Hopefully a consensus, or at least a healthy majority, would agree > on which was the best implementation and we could make that the default > for Lucene 2.0. > > Are there folks (e.g., Chuck) who would be willing to play this game? > Should we make it more formal, using, e.g., TREC? Does anyone have > other ideas how we should decide how to modify DefaultSimilarity? > > Doug > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]