Re: Lucene nightly build: similarity score per field

Robert Muir Thu, 03 Mar 2011 07:35:45 -0800

On Thu, Mar 3, 2011 at 10:25 AM, Patrick Diviacco
<[email protected]> wrote:
> I've downloaded Lucene nightly build because I need to customize the
> similarity *per field*.
>
> However I don't see the field parameter passed to the methods to compute the
> score such as "tf" and "idf"...
>
> how can I implement different similarities score per document field then ?
>


Hi, the way you set this up is to use SimilarityProvider to configure
Similarities per-field: for example maybe field A, B, and C use
Similarity1 and field D use Similarity2.
So you just set your SimilarityProvider on the IndexWriter and
IndexSearcher, and it must implement this factory method:

  Similarity get(String field)

Here are the reasons for this factory design (versus simply adding
field to every method):
1. performance: up-front we ask the SimilarityProvider for the
per-field Similarity. So you probably use a hashmap or something here
to return the correct one. If you had to do this on every single call
to tf(), this would slow down queries significantly.
2. flexibility: we are working to generalize Similarity, and maybe the
existing stuff you see becomes TFIDFSimilarity. So in the future you
might have field1 that uses TFIDF and field2 that uses something else
(e.g. BM25), with a totally different API and scoring system.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Lucene nightly build: similarity score per field

Reply via email to