Hello Alessandro. Glad to hear! There's not much update from the previously published link: just a tiny test. Guessing max tf doesn't seem really reliable. However, I've got another idea: Can't Impacts give us an exact max score like https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/search/Scorer.html#getMaxScore(int)?
I don't know if it's possible and how to do it. On Thu, May 9, 2024 at 6:11 PM Alessandro Benedetti <a.benede...@sease.io> wrote: > Hi Mikhail, > I was thinking again about this regarding Hybrid Search in Solr and the > current > https://solr.apache.org/guide/solr/latest/query-guide/function-queries.html#scale-function > . > Was there any progress on this? Any traction? > Sooner or later I hope to get some funds to work on this, I keep you > updated! > I agree this would be useful in Learning To Rank and Hybrid Search in > general. > The current original score feature is unlikely to be useful if not > normalised per an estimated maximum score. > > Cheers > -------------------------- > *Alessandro Benedetti* > Director @ Sease Ltd. > *Apache Lucene/Solr Committer* > *Apache Solr PMC Member* > > e-mail: a.benede...@sease.io > > > *Sease* - Information Retrieval Applied > Consulting | Training | Open Source > > Website: Sease.io <http://sease.io/> > LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter > <https://twitter.com/seaseltd> | Youtube > <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github > <https://github.com/seaseltd> > > > On Mon, 13 Feb 2023 at 12:47, Mikhail Khludnev <m...@apache.org> wrote: > >> Hello. >> Just FYI. I scratched a little prototype >> https://github.com/mkhludnev/likely/blob/main/src/test/java/org/apache/lucene/contrb/highly/TestLikelyReader.java#L53 >> To estimate maximum possible score for the query against an index: >> - it creates a virtual index (LikelyReader), which >> - contains all terms from the original index with the same docCount >> - matching all of these terms in the first doc (docnum=0) with the >> maximum termFreq (which estimating is a separate question). >> So, if we search over this LikelyReader we get a score estimate, which >> can hardly be exceeded by the same query over the original index. >> I suppose this might be useful for LTR as a better alternative to the >> query score feature. >> >> On Tue, Dec 6, 2022 at 10:02 AM Mikhail Khludnev <m...@apache.org> wrote: >> >>> Hello dev! >>> Users are interested in the meaning of absolute value of the score, but >>> we always reply that it's just relative value. Maximum score of matched >>> docs is not an answer. >>> Ultimately we need to measure how much sense a query has in the index. >>> e.g. [jet OR propulsion OR spider] query should be measured like >>> nonsense, because the best matching docs have much lower scores than >>> hypothetical (and assuming absent) doc matching [jet AND propulsion AND >>> spider]. >>> Could it be a method that returns the maximum possible score if all >>> query terms would match. Something like stubbing postings on virtual >>> all_matching doc with average stats like tf and field length and kicks >>> scorers in? It reminds me something about probabilistic retrieval, but not >>> much. Is there anything like this already? >>> >>> -- >>> Sincerely yours >>> Mikhail Khludnev >>> >> >> >> -- >> Sincerely yours >> Mikhail Khludnev >> > -- Sincerely yours Mikhail Khludnev