Hi everyone

We are using Lucene to search for possible duplicates in an address database. 
We create an index with a document for each person in the database. Each 
document has a field with one term for the first name, a field with one term 
for the last name and so on. I think in this setting it doesn't make sense to 
let term frequency, inverse document frequency and friends influence the 
document score (or does it?). For this reason I'm thinking of overriding 
DefaultSimilarity to not take tf/idf into account when scoring.

Do you think that's a reasonable thing to do? If so, how should I proceed (I'm 
looking for implementation details here; should I, e.g., override the method 
that calculates the term frequency to just return a constant [altought, at the 
top of my head, I wouldn't know what would be a sensible constant.]).

Thanks a lot,
Damian

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to