I think what I'm looking for is to multiply the term frequency of each term by the similarity score.
E.g for 'shoes', its an exact match, so tf * 1 For 'socks', similarity = 0.8, -> tf * 0.8 'Clothes', similarity = 0.65 -> tf * 0.65 Is there a way to achieve this w/ Lucene's API or do I need to extend the similarity class myself? On Fri, Jul 3, 2020 at 8:44 PM Ali Akhtar <ali@ali.actor> wrote: > Hellooo, > > Suppose a user enters ‘box of shoes’ in my search box. I have two > documents titled ‘box of clothes’ and ‘box of socks’. I’ve figured out > through a separate algorithm that ‘socks’ is more similar to ‘shoes’ than > clothes. > > I even have a numeric score for the similarity: for socks it’s 0.8 and for > clothes is 0.65 > > How can I feed this info to lucene to help it rank socks higher than > clothes? > > I still want the usual tf-idf rules to apply. Ie’box’ and ‘of’ occur in a > lot of documents but ‘socks’ and ‘clothes’ are rarer so they should be > given more importance. > > So I don’t want to have to overwrite the similarity class. I just want to > be able to pass in the info that ‘socks’ and ‘clothes’ are both kinda like > synonyms for shoes, but socks is more similar to shoes than clothes. May be > create a boost using the similarity score which doesn’t artificially boost > frequent / less important terms. > > If I just provided them as regular synonyms, they they will both be > considered equal in weight. > > Thanks. > >