Hi Alex

Thank you very much for your feedback and the various insights!

Am 26.05.21 um 04:41 schrieb Alex K:
Hi Michael and others,

Sorry just now getting back to you. For your three original questions:

- Yes, I was referring to the Lucene90Hnsw* classes. Michael S. had a
thorough response.
- As far as I know Opendistro is calling out to a C/C++ binary to run the
actual HNSW algorithm and store the HNSW part of the index. When they
implemented it about a year ago, Lucene did not have this yet. I assume the
Lucene HNSW implementation is solid, but would not be surprised if it's
slower than the C/C++ based implementation, given the JVM has some
disadvantages for these kinds of CPU-bound/number crunching algos.
- I just haven't had much time to invest into my benchmark recently. In
particular, I got stuck on why indexing was taking extremely long. Just
indexing the vectors would have easily exceeded the current time
limitations in the ANN-benchmarks project. Maybe I had some naive mistake
in my implementation, but I profiled and dug pretty deep to make it fast.

I am trying to get Julie's branch running

https://github.com/jtibshirani/lucene/tree/hnsw-bench

Maybe this will help and is comparable



I'm assuming you want to use Lucene, but not necessarily via Elasticsearch?

Yes, for more simple setups I would like to use Lucene standalone, but for setups which have to scale I would use either Elasticsearch or Solr.

Thanks

Michael



If so, another option you might try for ANN is the elastiknn-models
and elastiknn-lucene packages. elastiknn-models contains the Locality
Sensitive Hashing implementations of ANN used by Elastiknn, and
elastiknn-lucene contains the Lucene queries used by Elastiknn.The Lucene
query is the MatchHashesAndScoreQuery
<https://github.com/alexklibisz/elastiknn/blob/master/elastiknn-lucene/src/main/java/org/apache/lucene/search/MatchHashesAndScoreQuery.java#L18-L22>.
There are a couple of scala test suites that show how to use it:
MatchHashesAndScoreQuerySuite
<https://github.com/alexklibisz/elastiknn/blob/master/elastiknn-testing/src/test/scala/com/klibisz/elastiknn/query/MatchHashesAndScoreQuerySuite.scala>.
MatchHashesAndScoreQueryPerformanceSuite
<https://github.com/alexklibisz/elastiknn/blob/master/elastiknn-testing/src/test/scala/com/klibisz/elastiknn/query/MatchHashesAndScoreQueryPerformanceSuite.scala>.
This is all designed to work independently from Elasticsearch and is
published on Maven: com.klibisz.elastiknn / lucene
<https://search.maven.org/artifact/com.klibisz.elastiknn/lucene/7.12.1.0/jar>
and
com.klibisz.elastiknn / models
<https://search.maven.org/artifact/com.klibisz.elastiknn/models/7.12.1.0/jar>.
The tests are Scala but all of the implementation is in Java.

Thanks,
Alex



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to