That article is copied from the old wiki, so it is much earlier than 2019, more 
like 2009. Unfortunately, the links to the email discussion are all dead, but 
the issues in the article are still true.

If you really want to go down that path, you might be able to do it with a 
similarity class that implements a probabilistic relevance model. I’d start the 
literature search with this Google query.

probablistic information retrieval 
<https://www.google.com/search?client=safari&rls=en&q=probablistic+information+retrieval&ie=UTF-8&oe=UTF-8>

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 18, 2022, at 2:47 AM, Mikhail Khludnev <m...@apache.org> wrote:
> 
> Thanks for replym Walter.
> Recently Robert commented on PR with the link 
> https://cwiki.apache.org/confluence/display/LUCENE/ScoresAsPercentages 
> <https://cwiki.apache.org/confluence/display/LUCENE/ScoresAsPercentages> it 
> gives arguments against my proposal. Honestly, I'm still in doubt.  
> 
> On Tue, Dec 6, 2022 at 8:15 PM Walter Underwood <wun...@wunderwood.org 
> <mailto:wun...@wunderwood.org>> wrote:
> As you point out, this is a probabilistic relevance model. Lucene uses a 
> vector space model.
> 
> A probabilistic model gives an estimate of how relevant each document is to 
> the query. Unfortunately, their overall relevance isn’t as good as a vector 
> space model.
> 
> You could calculate an ideal score, but that can change every time a document 
> is added to or deleted from the index, because of idf. So the ideal score 
> isn’t a useful mental model. 
> 
> Essentially, you need to tell your users to worry about something that 
> matters. The absolute value of the score does not matter.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org <mailto:wun...@wunderwood.org>
> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>  (my blog)
> 
>> On Dec 5, 2022, at 11:02 PM, Mikhail Khludnev <m...@apache.org 
>> <mailto:m...@apache.org>> wrote:
>> 
>> Hello dev! 
>> Users are interested in the meaning of absolute value of the score, but we 
>> always reply that it's just relative value. Maximum score of matched docs is 
>> not an answer. 
>> Ultimately we need to measure how much sense a query has in the index. e.g. 
>> [jet OR propulsion OR spider] query should be measured like nonsense, 
>> because the best matching docs have much lower scores than hypothetical (and 
>> assuming absent) doc matching [jet AND propulsion AND spider].
>> Could it be a method that returns the maximum possible score if all query 
>> terms would match. Something like stubbing postings on virtual all_matching 
>> doc with average stats like tf and field length and kicks scorers in? It 
>> reminds me something about probabilistic retrieval, but not much. Is there 
>> anything like this already?       
>> 
>> -- 
>> Sincerely yours
>> Mikhail Khludnev
> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev

Reply via email to