I would like to contribute a class based on the MoreLikeThis class in contrib/queries that generates a query based on the tags associated with a document. The class assumes that documents are tagged with a set of tags (which are stored in the index in a seperate Field). The class determines the top document terms associated with a given tag using the information gain metric.
While generating a MoreLikeThis query for a document the tags associated with document are used to determine the terms in the query. This class is useful for finding similar documents to a document that does not have many relevant terms but was tagged. I created a patch LUCENE-1910.patch that demonstrates this class, I would appreciate any feedback. Thomas --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org