Hi Michael, that is fully-functional in the sense that Lucene will build an HNSW graph for a vector-valued field and you can then use the VectorReader.search method to do KNN-based search. Next steps may include some integration with lexical, inverted-index type search so that you can retrieve N-closest constrained by other constraints. Today you can approximate that by oversampling and filtering. There is also interest in pursuing other KNN search algorithms, and we have been working to make sure the VectorFormat API (might still get renamed due to confusion with other kinds of vectors existing in Lucene) can support alternative KNN implementations.
On Wed, May 19, 2021 at 12:22 PM Michael Wechner <michael.wech...@wyona.com> wrote: > > Hi Alex > > Just to make sure I understand better what the additions are about > > Am 21.04.21 um 17:21 schrieb Alex K: > > There were a couple additions recently merged into lucene but not yet > > released: > > - A first-class vector codec > > do you mean the classes inside > > https://github.com/apache/lucene/tree/main/lucene/core/src/java/org/apache/lucene/codecs/lucene90 > > and in particular > > Lucene90HnswVectorFormat.java Lucene90HnswVectorReader.java > Lucene90HnswVectorWriter.java > > ? > > > - An implementation of HNSW for approximate nearest neighbor search > > the HNSW implementation at > > https://github.com/apache/lucene/tree/main/lucene/core/src/java/org/apache/lucene/util/hnsw > > is similar to > > https://opendistro.github.io/for-elasticsearch/blog/odfe-updates/2020/04/Building-k-Nearest-Neighbor-(k-NN)-Similarity-Search-Engine-with-Elasticsearch/ > > ? > > > > They are however available in the snapshot releases. I started on a small > > project to get the HNSW implementation into the ann-benchmarks project, but > > had to set it aside. > > Is there still something missing? Or what would be the next steps? > > Thanks > > Michael > > > > Here's the code: > > https://github.com/alexklibisz/ann-benchmarks-lucene. There are some test > > suites that index and search Glove vectors. My first impression was that > > indexing seems surprisingly slow, but it's entirely possible I'm doing > > something wrong. > > > > On Wed, Apr 21, 2021 at 9:31 AM Michael Wechner <michael.wech...@wyona.com> > > wrote: > > > >> Hi > >> > >> I recently found the following articles re Lucene/Solr and BERT > >> > >> https://dmitry-kan.medium.com/neural-search-with-bert-and-solr-ea5ead060b28 > >> > >> https://medium.com/swlh/fun-with-apache-lucene-and-bert-embeddings-c2c496baa559 > >> > >> and would like to ask whether there might be more recent developments > >> within the Lucene/Solr community re BERT integration? > >> > >> Also how these developments relate to > >> > >> https://sbert.net/ > >> > >> ? > >> > >> Thanks very much for your insights! > >> > >> Michael > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org