Re: Hybrid scoring lexical / vector

Alessandro Benedetti Thu, 25 May 2023 02:17:41 -0700

Hi all,
our approach to providing hybrid search in Solr has been focused on the
reranking side, specifically enabling vector-based features in Learning To
Rank.
In this way, you can combine lexical features (such as the original BM25
score) with various vector distances (in more than one field if you like)
and other factors using whatever model is supported (linear, tree-based,
neural network)
To do first-stage hybrid retrieval, that should be already decently
available through the boolean query parser.


We started the work with function queries (that unfortunately are
scattered across Lucene and Solr, and now that the projects are separate
again, it's a lengthy process to go with.
Our first step is almost ready:
https://github.com/apache/lucene/pull/12253
Any feedback is welcome!

Then regarding the different problem of having an unbound relevance score
in Lucene/Solr, I agree that can (and should) be improved, I would love to
see it as a probabilistic score, but I imagine that making this change in
Lucene will cause an enormous discussion, probably ending in stand-still?
You have my support!


--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: [email protected]


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>


On Tue, 23 May 2023 at 19:17, Mikhail Khludnev <[email protected]> wrote:

> Hello, Joel.
>
> Here's my idea
> https://lists.apache.org/thread/6t45p5fk4hldrt1833kvrbobdd2pk265
>
>
> On Tue, May 23, 2023 at 6:20 PM Joel Bernstein <[email protected]> wrote:
>
> > One of the things that I'm focusing on is combining the Solr similarity
> > score with the vector score in a consistent manner. My main concern is
> > dealing with the unbounded nature of the Solr similarity score and how to
> > balance that with a vector score.
> >
> > So my first question are there any mechanisms now to scale or squash the
> > Solr similarity score before combining with a vector score?
> >
> > Below are two ideas I have for squashing / scaling the score:
> >
> > 1) SquashingScoreQuery. This is a wrapper query that squashes the score
> of
> > its wrapped query using a sigmoid function.
> >
> > 2) Min/Max scale the main query score in the ReRanker. This simply adds a
> > flag to the ReRanker to min/max scale the main query scores before
> > combining with the ReRank query.
> >
> > Do others have thoughts on this?
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> https://t.me/MUST_SEARCH
> A caveat: Cyrillic!
>

Re: Hybrid scoring lexical / vector

Reply via email to