On Tuesday 30 November 2004 18:46, Xiangyu Jin wrote: > > THis might be a stupid question. > > When perform retrieval for a query, deos Lucene first get > a subset of candidate matches and then perform the ranking > on the set? That is, similarity calculation is performed only > on a subset of the docuemnts to the query.
Yes, Lucene uses an inverted index for this. > If so, from which module could I get those candidate docs, > then I can perform my own similarity calculations (since > I might need to rewrite the normalization factor, so > only modify the "similarity" model seems will not > work). To change the normalisation you may consider implementing your own Weight: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Weight.html For some example implementations of Weight the Lucene source code in the org.apache.lucene.search package is the best resource. Using your own Weight also requires a subclass of Query that returns this weight in the createWeight() method. > Or, is there document describe the produre of how Lucene > perform search? This describes the scoring: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html See also the DefaultSimilarity. Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
