As you point out, this is a probabilistic relevance model. Lucene uses a vector space model.
A probabilistic model gives an estimate of how relevant each document is to the query. Unfortunately, their overall relevance isn’t as good as a vector space model. You could calculate an ideal score, but that can change every time a document is added to or deleted from the index, because of idf. So the ideal score isn’t a useful mental model. Essentially, you need to tell your users to worry about something that matters. The absolute value of the score does not matter. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 5, 2022, at 11:02 PM, Mikhail Khludnev <m...@apache.org> wrote: > > Hello dev! > Users are interested in the meaning of absolute value of the score, but we > always reply that it's just relative value. Maximum score of matched docs is > not an answer. > Ultimately we need to measure how much sense a query has in the index. e.g. > [jet OR propulsion OR spider] query should be measured like nonsense, because > the best matching docs have much lower scores than hypothetical (and assuming > absent) doc matching [jet AND propulsion AND spider]. > Could it be a method that returns the maximum possible score if all query > terms would match. Something like stubbing postings on virtual all_matching > doc with average stats like tf and field length and kicks scorers in? It > reminds me something about probabilistic retrieval, but not much. Is there > anything like this already? > > -- > Sincerely yours > Mikhail Khludnev