Thanks for replym Walter.
Recently Robert commented on PR with the link
https://cwiki.apache.org/confluence/display/LUCENE/ScoresAsPercentages it
gives arguments against my proposal. Honestly, I'm still in doubt.

On Tue, Dec 6, 2022 at 8:15 PM Walter Underwood <wun...@wunderwood.org>
wrote:

> As you point out, this is a probabilistic relevance model. Lucene uses a
> vector space model.
>
> A probabilistic model gives an estimate of how relevant each document is
> to the query. Unfortunately, their overall relevance isn’t as good as a
> vector space model.
>
> You could calculate an ideal score, but that can change every time a
> document is added to or deleted from the index, because of idf. So the
> ideal score isn’t a useful mental model.
>
> Essentially, you need to tell your users to worry about something that
> matters. The absolute value of the score does not matter.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> On Dec 5, 2022, at 11:02 PM, Mikhail Khludnev <m...@apache.org> wrote:
>
> Hello dev!
> Users are interested in the meaning of absolute value of the score, but we
> always reply that it's just relative value. Maximum score of matched docs
> is not an answer.
> Ultimately we need to measure how much sense a query has in the index.
> e.g. [jet OR propulsion OR spider] query should be measured like
> nonsense, because the best matching docs have much lower scores than
> hypothetical (and assuming absent) doc matching [jet AND propulsion AND
> spider].
> Could it be a method that returns the maximum possible score if all query
> terms would match. Something like stubbing postings on virtual all_matching
> doc with average stats like tf and field length and kicks scorers in? It
> reminds me something about probabilistic retrieval, but not much. Is there
> anything like this already?
>
> --
> Sincerely yours
> Mikhail Khludnev
>
>
>

-- 
Sincerely yours
Mikhail Khludnev

Reply via email to