Re: Tuning MoreLikeThis scoring algorithm

Robert Muir Fri, 28 May 2021 18:27:47 -0700

See https://cwiki.apache.org/confluence/display/LUCENE/ScoresAsPercentages
which has some broken nabble links, but is still valid.


TLDR: Scoring just doesn't work the way you think. Don't try to
interpret it as an absolute value, it is a relative one.

On Fri, May 28, 2021 at 1:36 PM TK Solr <tksol...@sonic.net> wrote:
>
> I'd like to have suggestions on changing the scoring algorithm
> of MoreLikeThis.
>
> When I feed the identical string as the content of a document in the index
> to MoreLikeThis.like("field", new StringReader(docContent)),
> I get a score less than 1.0 (0.944 in one of my test cases) that I expect.
>
> What is the easiest way to change this so that the score is 1.0 when
> all the terms in the query matches with all the terms of a document?
> The score should be less than 1.0 if the query contains only a part of the 
> terms
> from the document. (Needless to say, the score should also be less than 1.0
> if only part of the query terms are found in the document.)
>
> For my purpose, I don't need a sophisticated search relevancy technique
> like TF-IDF. I'd like it work faster/cheaper.
>
> I tried using BooleanSimilarity, but that ended up returning a score above 
> 1.0.
> Also the score is the same as long as all the terms in the query are matched.
> For example, querying "quick brown fox" and "quick brown" yield the same score
> against
> the doc that has the famous test string.
>
>
> TK
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Tuning MoreLikeThis scoring algorithm

Reply via email to