I think what I'm looking for is to multiply the term frequency of each term
by the similarity score.

E.g for 'shoes', its an exact match, so tf * 1
For 'socks', similarity = 0.8, -> tf * 0.8
'Clothes', similarity = 0.65 -> tf * 0.65

Is there a way to achieve this w/ Lucene's API or do I need to extend the
similarity class myself?

On Fri, Jul 3, 2020 at 8:44 PM Ali Akhtar <ali@ali.actor> wrote:

> Hellooo,
>
> Suppose a user enters ‘box of shoes’ in my search box. I have two
> documents titled ‘box of clothes’ and ‘box of socks’. I’ve figured out
> through a separate algorithm that ‘socks’ is more similar to ‘shoes’ than
> clothes.
>
> I even have a numeric score for the similarity: for socks it’s 0.8 and for
> clothes is 0.65
>
> How can I feed this info to lucene to help it rank socks higher than
> clothes?
>
> I still want the usual tf-idf rules to apply. Ie’box’ and ‘of’ occur in a
> lot of documents but ‘socks’ and ‘clothes’ are rarer so they should be
> given more importance.
>
> So I don’t want to have to overwrite the similarity class. I just want to
> be able to pass in the info that ‘socks’ and ‘clothes’ are both kinda like
> synonyms for shoes, but socks is more similar to shoes than clothes. May be
> create a boost using the similarity score which doesn’t artificially boost
> frequent / less important terms.
>
> If I just provided them as regular synonyms, they they will both be
> considered equal in weight.
>
> Thanks.
>
>

Reply via email to