Hello. Just FYI. I scratched a little prototype https://github.com/mkhludnev/likely/blob/main/src/test/java/org/apache/lucene/contrb/highly/TestLikelyReader.java#L53 To estimate maximum possible score for the query against an index: - it creates a virtual index (LikelyReader), which - contains all terms from the original index with the same docCount - matching all of these terms in the first doc (docnum=0) with the maximum termFreq (which estimating is a separate question). So, if we search over this LikelyReader we get a score estimate, which can hardly be exceeded by the same query over the original index. I suppose this might be useful for LTR as a better alternative to the query score feature.
On Tue, Dec 6, 2022 at 10:02 AM Mikhail Khludnev <m...@apache.org> wrote: > Hello dev! > Users are interested in the meaning of absolute value of the score, but we > always reply that it's just relative value. Maximum score of matched docs > is not an answer. > Ultimately we need to measure how much sense a query has in the index. > e.g. [jet OR propulsion OR spider] query should be measured like > nonsense, because the best matching docs have much lower scores than > hypothetical (and assuming absent) doc matching [jet AND propulsion AND > spider]. > Could it be a method that returns the maximum possible score if all query > terms would match. Something like stubbing postings on virtual all_matching > doc with average stats like tf and field length and kicks scorers in? It > reminds me something about probabilistic retrieval, but not much. Is there > anything like this already? > > -- > Sincerely yours > Mikhail Khludnev > -- Sincerely yours Mikhail Khludnev