Overriding DefaultSimilarity to not consider tf/idf and friends

Damian Birchler Mon, 05 Nov 2012 02:27:26 -0800

Hi everyone

We are using Lucene to search for possible duplicates in an address database. 
We create an index with a document for each person in the database. Each 
document has a field with one term for the first name, a field with one term 
for the last name and so on. I think in this setting it doesn't make sense to 
let term frequency, inverse document frequency and friends influence the 
document score (or does it?). For this reason I'm thinking of overriding 
DefaultSimilarity to not take tf/idf into account when scoring.


Do you think that's a reasonable thing to do? If so, how should I proceed (I'm 
looking for implementation details here; should I, e.g., override the method 
that calculates the term frequency to just return a constant [altought, at the 
top of my head, I wouldn't know what would be a sensible constant.]).

Thanks a lot,
Damian

smime.p7s
Description: S/MIME cryptographic signature

Overriding DefaultSimilarity to not consider tf/idf and friends

Reply via email to