I think it would be good to provide something like a VectorRerankField
(sorry for the bad name, maybe FastVectorField would be amusing too),
that just stores vectors as docvalues (no HNSW) and has a
newRescorer() method that implements
org.apache.lucene.search.Rescorer. Then its easy to do as that
document describes, pull top 500 hits with BM25 and rerank them with
your vectors, very fast, only 500 calculations required, no HNSW or
anything needed. Of course you could use a vector search instead of a
BM25 search as the initial search to pull the top 500 hits too.

So it could meet both use-cases and provide a really performant option
for users that want to integrate vector search.

On Fri, Feb 10, 2023 at 10:21 AM Michael Wechner
<michael.wech...@wyona.com> wrote:
>
> Hi
>
> I use the vector search of Lucene, whereas the embeddings I get from
> SentenceBERT for example.
>
> According to
>
> https://www.sbert.net/examples/applications/retrieve_rerank/README.html
>
> a re-ranking with a cross-encoder after the vector search (bi-encoding)
> can improve the ranking.
>
> Would it make sense to add this kind of functionality to Lucene or is
> somebody already working on something similar?
>
> Thanks
>
> Michael
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to