Re: Maximum score estimation

Mikhail Khludnev Fri, 10 May 2024 13:21:45 -0700

Hello Alessandro.
Glad to hear!
There's not much update from the previously published link: just a tiny
test. Guessing max tf doesn't seem really reliable.
However, I've got another idea:
Can't Impacts give us an exact max score like
https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/search/Scorer.html#getMaxScore(int)?


I don't know if it's possible and how to do it.

On Thu, May 9, 2024 at 6:11 PM Alessandro Benedetti <[email protected]>
wrote:

> Hi Mikhail,
> I was thinking again about this regarding Hybrid Search in Solr and the
> current
> https://solr.apache.org/guide/solr/latest/query-guide/function-queries.html#scale-function
> .
> Was there any progress on this? Any traction?
> Sooner or later I hope to get some funds to work on this, I keep you
> updated!
> I agree this would be useful in Learning To Rank and Hybrid Search in
> general.
> The current original score feature is unlikely to be useful if not
> normalised per an estimated maximum score.
>
> Cheers
> --------------------------
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
>
> e-mail: [email protected]
>
>
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io <http://sease.io/>
> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
> <https://twitter.com/seaseltd> | Youtube
> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
> <https://github.com/seaseltd>
>
>
> On Mon, 13 Feb 2023 at 12:47, Mikhail Khludnev <[email protected]> wrote:
>
>> Hello.
>> Just FYI. I scratched a little prototype
>> https://github.com/mkhludnev/likely/blob/main/src/test/java/org/apache/lucene/contrb/highly/TestLikelyReader.java#L53
>> To estimate maximum possible score for the query against an index:
>>  - it creates a virtual index (LikelyReader), which
>>  - contains all terms from the original index with the same docCount
>>  - matching all of these terms in the first doc (docnum=0) with the
>> maximum termFreq (which estimating is a separate question).
>> So, if we search over this LikelyReader we get a score estimate, which
>> can hardly be exceeded by the same query over the original index.
>> I suppose this might be useful for LTR as a better alternative to the
>> query score feature.
>>
>> On Tue, Dec 6, 2022 at 10:02 AM Mikhail Khludnev <[email protected]> wrote:
>>
>>> Hello dev!
>>> Users are interested in the meaning of absolute value of the score, but
>>> we always reply that it's just relative value. Maximum score of matched
>>> docs is not an answer.
>>> Ultimately we need to measure how much sense a query has in the index.
>>> e.g. [jet OR propulsion OR spider] query should be measured like
>>> nonsense, because the best matching docs have much lower scores than
>>> hypothetical (and assuming absent) doc matching [jet AND propulsion AND
>>> spider].
>>> Could it be a method that returns the maximum possible score if all
>>> query terms would match. Something like stubbing postings on virtual
>>> all_matching doc with average stats like tf and field length and kicks
>>> scorers in? It reminds me something about probabilistic retrieval, but not
>>> much. Is there anything like this already?
>>>
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>

-- 
Sincerely yours
Mikhail Khludnev

Re: Maximum score estimation

Reply via email to