Re: Maximum score estimation

Adrien Grand Wed, 22 May 2024 10:23:35 -0700

Hi Mikhail,

You is correct, it should give an ok upper bound of scores on term queries
and combinations of term queries via BooleanQuery.


On Wed, May 22, 2024 at 6:57 PM Mikhail Khludnev <m...@apache.org> wrote:

> I'm trying to understand Impacts. Need help.
> https://github.com/apache/lucene/issues/5270#issuecomment-1223383919
> Does it mean
> advanceShallow(0)
> getMaxScore(maxDoc-1)
> gives a  good max score estem at least for a term query?
>
> On Fri, May 10, 2024 at 11:21 PM Mikhail Khludnev <m...@apache.org> wrote:
>
>> Hello Alessandro.
>> Glad to hear!
>> There's not much update from the previously published link: just a tiny
>> test. Guessing max tf doesn't seem really reliable.
>> However, I've got another idea:
>> Can't Impacts give us an exact max score like
>> https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/search/Scorer.html#getMaxScore(int)?
>>
>> I don't know if it's possible and how to do it.
>>
>> On Thu, May 9, 2024 at 6:11 PM Alessandro Benedetti <a.benede...@sease.io>
>> wrote:
>>
>>> Hi Mikhail,
>>> I was thinking again about this regarding Hybrid Search in Solr and the
>>> current
>>> https://solr.apache.org/guide/solr/latest/query-guide/function-queries.html#scale-function
>>> .
>>> Was there any progress on this? Any traction?
>>> Sooner or later I hope to get some funds to work on this, I keep you
>>> updated!
>>> I agree this would be useful in Learning To Rank and Hybrid Search in
>>> general.
>>> The current original score feature is unlikely to be useful if not
>>> normalised per an estimated maximum score.
>>>
>>> Cheers
>>> --------------------------
>>> *Alessandro Benedetti*
>>> Director @ Sease Ltd.
>>> *Apache Lucene/Solr Committer*
>>> *Apache Solr PMC Member*
>>>
>>> e-mail: a.benede...@sease.io
>>>
>>>
>>> *Sease* - Information Retrieval Applied
>>> Consulting | Training | Open Source
>>>
>>> Website: Sease.io <http://sease.io/>
>>> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
>>> <https://twitter.com/seaseltd> | Youtube
>>> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
>>> <https://github.com/seaseltd>
>>>
>>>
>>> On Mon, 13 Feb 2023 at 12:47, Mikhail Khludnev <m...@apache.org> wrote:
>>>
>>>> Hello.
>>>> Just FYI. I scratched a little prototype
>>>> https://github.com/mkhludnev/likely/blob/main/src/test/java/org/apache/lucene/contrb/highly/TestLikelyReader.java#L53
>>>> To estimate maximum possible score for the query against an index:
>>>>  - it creates a virtual index (LikelyReader), which
>>>>  - contains all terms from the original index with the same docCount
>>>>  - matching all of these terms in the first doc (docnum=0) with the
>>>> maximum termFreq (which estimating is a separate question).
>>>> So, if we search over this LikelyReader we get a score estimate, which
>>>> can hardly be exceeded by the same query over the original index.
>>>> I suppose this might be useful for LTR as a better alternative to the
>>>> query score feature.
>>>>
>>>> On Tue, Dec 6, 2022 at 10:02 AM Mikhail Khludnev <m...@apache.org>
>>>> wrote:
>>>>
>>>>> Hello dev!
>>>>> Users are interested in the meaning of absolute value of the score,
>>>>> but we always reply that it's just relative value. Maximum score of 
>>>>> matched
>>>>> docs is not an answer.
>>>>> Ultimately we need to measure how much sense a query has in the index.
>>>>> e.g. [jet OR propulsion OR spider] query should be measured like
>>>>> nonsense, because the best matching docs have much lower scores than
>>>>> hypothetical (and assuming absent) doc matching [jet AND propulsion AND
>>>>> spider].
>>>>> Could it be a method that returns the maximum possible score if all
>>>>> query terms would match. Something like stubbing postings on virtual
>>>>> all_matching doc with average stats like tf and field length and kicks
>>>>> scorers in? It reminds me something about probabilistic retrieval, but not
>>>>> much. Is there anything like this already?
>>>>>
>>>>> --
>>>>> Sincerely yours
>>>>> Mikhail Khludnev
>>>>>
>>>>
>>>>
>>>> --
>>>> Sincerely yours
>>>> Mikhail Khludnev
>>>>
>>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Adrien

Re: Maximum score estimation

Reply via email to