On Tuesday 30 November 2004 18:46, Xiangyu Jin wrote:
> 
> THis might be a stupid question.
> 
> When perform retrieval for a query, deos Lucene first get
> a subset of candidate matches and then perform the ranking
> on the set? That is, similarity calculation is performed only
> on a subset of the docuemnts to the query.

Yes, Lucene uses  an inverted index for this.

> If so, from which module could I get those candidate docs,
> then I can perform my own similarity calculations (since
> I might need to rewrite the normalization factor, so
> only modify the "similarity" model seems will not
> work).

To change the normalisation you may consider implementing
your own Weight:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Weight.html
For some example implementations of Weight the Lucene source
code in the org.apache.lucene.search package is the best resource.

Using your own Weight also requires a subclass of Query that returns
this weight in the createWeight() method.

> Or, is there document describe the produre of how Lucene
> perform search?

This describes the scoring:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html
See also the DefaultSimilarity.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to