As you point out, this is a probabilistic relevance model. Lucene uses a vector 
space model.

A probabilistic model gives an estimate of how relevant each document is to the 
query. Unfortunately, their overall relevance isn’t as good as a vector space 
model.

You could calculate an ideal score, but that can change every time a document 
is added to or deleted from the index, because of idf. So the ideal score isn’t 
a useful mental model. 

Essentially, you need to tell your users to worry about something that matters. 
The absolute value of the score does not matter.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 5, 2022, at 11:02 PM, Mikhail Khludnev <m...@apache.org> wrote:
> 
> Hello dev! 
> Users are interested in the meaning of absolute value of the score, but we 
> always reply that it's just relative value. Maximum score of matched docs is 
> not an answer. 
> Ultimately we need to measure how much sense a query has in the index. e.g. 
> [jet OR propulsion OR spider] query should be measured like nonsense, because 
> the best matching docs have much lower scores than hypothetical (and assuming 
> absent) doc matching [jet AND propulsion AND spider].
> Could it be a method that returns the maximum possible score if all query 
> terms would match. Something like stubbing postings on virtual all_matching 
> doc with average stats like tf and field length and kicks scorers in? It 
> reminds me something about probabilistic retrieval, but not much. Is there 
> anything like this already?       
> 
> -- 
> Sincerely yours
> Mikhail Khludnev

Reply via email to