Hi Michael
Thank you for your explanations!
I am currently trying to implement it, whereas I am learning from the
code of
https://github.com/jtibshirani/lucene/blob/hnsw-bench/lucene/core/src/java/org/apache/lucene/search/PythonEntryPoint.java
whereas Julie told me, that the code is a bit out-of-date, but should be
updated very soon.
It would be great to have some example code, similar to what is
available for Lucene otherwise
https://lucene.apache.org/core/8_8_2/core/overview-summary.html#overview.description
Does this already exist? If not, I could try to create some and
contribute it to the documentation.
Thanks
Michael
Am 24.05.21 um 05:22 schrieb Michael Sokolov:
Hi Michael, that is fully-functional in the sense that Lucene will
build an HNSW graph for a vector-valued field and you can then use the
VectorReader.search method to do KNN-based search. Next steps may
include some integration with lexical, inverted-index type search so
that you can retrieve N-closest constrained by other constraints.
Today you can approximate that by oversampling and filtering. There is
also interest in pursuing other KNN search algorithms, and we have
been working to make sure the VectorFormat API (might still get
renamed due to confusion with other kinds of vectors existing in
Lucene) can support alternative KNN implementations.
On Wed, May 19, 2021 at 12:22 PM Michael Wechner
<michael.wech...@wyona.com> wrote:
Hi Alex
Just to make sure I understand better what the additions are about
Am 21.04.21 um 17:21 schrieb Alex K:
There were a couple additions recently merged into lucene but not yet
released:
- A first-class vector codec
do you mean the classes inside
https://github.com/apache/lucene/tree/main/lucene/core/src/java/org/apache/lucene/codecs/lucene90
and in particular
Lucene90HnswVectorFormat.java Lucene90HnswVectorReader.java
Lucene90HnswVectorWriter.java
?
- An implementation of HNSW for approximate nearest neighbor search
the HNSW implementation at
https://github.com/apache/lucene/tree/main/lucene/core/src/java/org/apache/lucene/util/hnsw
is similar to
https://opendistro.github.io/for-elasticsearch/blog/odfe-updates/2020/04/Building-k-Nearest-Neighbor-(k-NN)-Similarity-Search-Engine-with-Elasticsearch/
?
They are however available in the snapshot releases. I started on a small
project to get the HNSW implementation into the ann-benchmarks project, but
had to set it aside.
Is there still something missing? Or what would be the next steps?
Thanks
Michael
Here's the code:
https://github.com/alexklibisz/ann-benchmarks-lucene. There are some test
suites that index and search Glove vectors. My first impression was that
indexing seems surprisingly slow, but it's entirely possible I'm doing
something wrong.
On Wed, Apr 21, 2021 at 9:31 AM Michael Wechner <michael.wech...@wyona.com>
wrote:
Hi
I recently found the following articles re Lucene/Solr and BERT
https://dmitry-kan.medium.com/neural-search-with-bert-and-solr-ea5ead060b28
https://medium.com/swlh/fun-with-apache-lucene-and-bert-embeddings-c2c496baa559
and would like to ask whether there might be more recent developments
within the Lucene/Solr community re BERT integration?
Also how these developments relate to
https://sbert.net/
?
Thanks very much for your insights!
Michael
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org