Re: Maximum score estimation

Mikhail Khludnev Wed, 22 May 2024 09:57:04 -0700

I'm trying to understand Impacts. Need help.
https://github.com/apache/lucene/issues/5270#issuecomment-1223383919
Does it mean
advanceShallow(0)
getMaxScore(maxDoc-1)
gives a  good max score estem at least for a term query?


On Fri, May 10, 2024 at 11:21 PM Mikhail Khludnev <[email protected]> wrote:

> Hello Alessandro.
> Glad to hear!
> There's not much update from the previously published link: just a tiny
> test. Guessing max tf doesn't seem really reliable.
> However, I've got another idea:
> Can't Impacts give us an exact max score like
> https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/search/Scorer.html#getMaxScore(int)?
>
> I don't know if it's possible and how to do it.
>
> On Thu, May 9, 2024 at 6:11 PM Alessandro Benedetti <[email protected]>
> wrote:
>
>> Hi Mikhail,
>> I was thinking again about this regarding Hybrid Search in Solr and the
>> current
>> https://solr.apache.org/guide/solr/latest/query-guide/function-queries.html#scale-function
>> .
>> Was there any progress on this? Any traction?
>> Sooner or later I hope to get some funds to work on this, I keep you
>> updated!
>> I agree this would be useful in Learning To Rank and Hybrid Search in
>> general.
>> The current original score feature is unlikely to be useful if not
>> normalised per an estimated maximum score.
>>
>> Cheers
>> --------------------------
>> *Alessandro Benedetti*
>> Director @ Sease Ltd.
>> *Apache Lucene/Solr Committer*
>> *Apache Solr PMC Member*
>>
>> e-mail: [email protected]
>>
>>
>> *Sease* - Information Retrieval Applied
>> Consulting | Training | Open Source
>>
>> Website: Sease.io <http://sease.io/>
>> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
>> <https://twitter.com/seaseltd> | Youtube
>> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
>> <https://github.com/seaseltd>
>>
>>
>> On Mon, 13 Feb 2023 at 12:47, Mikhail Khludnev <[email protected]> wrote:
>>
>>> Hello.
>>> Just FYI. I scratched a little prototype
>>> https://github.com/mkhludnev/likely/blob/main/src/test/java/org/apache/lucene/contrb/highly/TestLikelyReader.java#L53
>>> To estimate maximum possible score for the query against an index:
>>>  - it creates a virtual index (LikelyReader), which
>>>  - contains all terms from the original index with the same docCount
>>>  - matching all of these terms in the first doc (docnum=0) with the
>>> maximum termFreq (which estimating is a separate question).
>>> So, if we search over this LikelyReader we get a score estimate, which
>>> can hardly be exceeded by the same query over the original index.
>>> I suppose this might be useful for LTR as a better alternative to the
>>> query score feature.
>>>
>>> On Tue, Dec 6, 2022 at 10:02 AM Mikhail Khludnev <[email protected]>
>>> wrote:
>>>
>>>> Hello dev!
>>>> Users are interested in the meaning of absolute value of the score, but
>>>> we always reply that it's just relative value. Maximum score of matched
>>>> docs is not an answer.
>>>> Ultimately we need to measure how much sense a query has in the index.
>>>> e.g. [jet OR propulsion OR spider] query should be measured like
>>>> nonsense, because the best matching docs have much lower scores than
>>>> hypothetical (and assuming absent) doc matching [jet AND propulsion AND
>>>> spider].
>>>> Could it be a method that returns the maximum possible score if all
>>>> query terms would match. Something like stubbing postings on virtual
>>>> all_matching doc with average stats like tf and field length and kicks
>>>> scorers in? It reminds me something about probabilistic retrieval, but not
>>>> much. Is there anything like this already?
>>>>
>>>> --
>>>> Sincerely yours
>>>> Mikhail Khludnev
>>>>
>>>
>>>
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>>
>>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev

Re: Maximum score estimation

Reply via email to