I'm trying to understand Impacts. Need help. https://github.com/apache/lucene/issues/5270#issuecomment-1223383919 Does it mean advanceShallow(0) getMaxScore(maxDoc-1) gives a good max score estem at least for a term query?
On Fri, May 10, 2024 at 11:21 PM Mikhail Khludnev <m...@apache.org> wrote: > Hello Alessandro. > Glad to hear! > There's not much update from the previously published link: just a tiny > test. Guessing max tf doesn't seem really reliable. > However, I've got another idea: > Can't Impacts give us an exact max score like > https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/search/Scorer.html#getMaxScore(int)? > > I don't know if it's possible and how to do it. > > On Thu, May 9, 2024 at 6:11 PM Alessandro Benedetti <a.benede...@sease.io> > wrote: > >> Hi Mikhail, >> I was thinking again about this regarding Hybrid Search in Solr and the >> current >> https://solr.apache.org/guide/solr/latest/query-guide/function-queries.html#scale-function >> . >> Was there any progress on this? Any traction? >> Sooner or later I hope to get some funds to work on this, I keep you >> updated! >> I agree this would be useful in Learning To Rank and Hybrid Search in >> general. >> The current original score feature is unlikely to be useful if not >> normalised per an estimated maximum score. >> >> Cheers >> -------------------------- >> *Alessandro Benedetti* >> Director @ Sease Ltd. >> *Apache Lucene/Solr Committer* >> *Apache Solr PMC Member* >> >> e-mail: a.benede...@sease.io >> >> >> *Sease* - Information Retrieval Applied >> Consulting | Training | Open Source >> >> Website: Sease.io <http://sease.io/> >> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter >> <https://twitter.com/seaseltd> | Youtube >> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github >> <https://github.com/seaseltd> >> >> >> On Mon, 13 Feb 2023 at 12:47, Mikhail Khludnev <m...@apache.org> wrote: >> >>> Hello. >>> Just FYI. I scratched a little prototype >>> https://github.com/mkhludnev/likely/blob/main/src/test/java/org/apache/lucene/contrb/highly/TestLikelyReader.java#L53 >>> To estimate maximum possible score for the query against an index: >>> - it creates a virtual index (LikelyReader), which >>> - contains all terms from the original index with the same docCount >>> - matching all of these terms in the first doc (docnum=0) with the >>> maximum termFreq (which estimating is a separate question). >>> So, if we search over this LikelyReader we get a score estimate, which >>> can hardly be exceeded by the same query over the original index. >>> I suppose this might be useful for LTR as a better alternative to the >>> query score feature. >>> >>> On Tue, Dec 6, 2022 at 10:02 AM Mikhail Khludnev <m...@apache.org> >>> wrote: >>> >>>> Hello dev! >>>> Users are interested in the meaning of absolute value of the score, but >>>> we always reply that it's just relative value. Maximum score of matched >>>> docs is not an answer. >>>> Ultimately we need to measure how much sense a query has in the index. >>>> e.g. [jet OR propulsion OR spider] query should be measured like >>>> nonsense, because the best matching docs have much lower scores than >>>> hypothetical (and assuming absent) doc matching [jet AND propulsion AND >>>> spider]. >>>> Could it be a method that returns the maximum possible score if all >>>> query terms would match. Something like stubbing postings on virtual >>>> all_matching doc with average stats like tf and field length and kicks >>>> scorers in? It reminds me something about probabilistic retrieval, but not >>>> much. Is there anything like this already? >>>> >>>> -- >>>> Sincerely yours >>>> Mikhail Khludnev >>>> >>> >>> >>> -- >>> Sincerely yours >>> Mikhail Khludnev >>> >> > > -- > Sincerely yours > Mikhail Khludnev > -- Sincerely yours Mikhail Khludnev