Thanks for replym Walter. Recently Robert commented on PR with the link https://cwiki.apache.org/confluence/display/LUCENE/ScoresAsPercentages it gives arguments against my proposal. Honestly, I'm still in doubt.
On Tue, Dec 6, 2022 at 8:15 PM Walter Underwood <wun...@wunderwood.org> wrote: > As you point out, this is a probabilistic relevance model. Lucene uses a > vector space model. > > A probabilistic model gives an estimate of how relevant each document is > to the query. Unfortunately, their overall relevance isn’t as good as a > vector space model. > > You could calculate an ideal score, but that can change every time a > document is added to or deleted from the index, because of idf. So the > ideal score isn’t a useful mental model. > > Essentially, you need to tell your users to worry about something that > matters. The absolute value of the score does not matter. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > On Dec 5, 2022, at 11:02 PM, Mikhail Khludnev <m...@apache.org> wrote: > > Hello dev! > Users are interested in the meaning of absolute value of the score, but we > always reply that it's just relative value. Maximum score of matched docs > is not an answer. > Ultimately we need to measure how much sense a query has in the index. > e.g. [jet OR propulsion OR spider] query should be measured like > nonsense, because the best matching docs have much lower scores than > hypothetical (and assuming absent) doc matching [jet AND propulsion AND > spider]. > Could it be a method that returns the maximum possible score if all query > terms would match. Something like stubbing postings on virtual all_matching > doc with average stats like tf and field length and kicks scorers in? It > reminds me something about probabilistic retrieval, but not much. Is there > anything like this already? > > -- > Sincerely yours > Mikhail Khludnev > > > -- Sincerely yours Mikhail Khludnev