Hi Alex Thank you very much for your feedback and the various insights!
Am 26.05.21 um 04:41 schrieb Alex K:
Hi Michael and others, Sorry just now getting back to you. For your three original questions: - Yes, I was referring to the Lucene90Hnsw* classes. Michael S. had a thorough response. - As far as I know Opendistro is calling out to a C/C++ binary to run the actual HNSW algorithm and store the HNSW part of the index. When they implemented it about a year ago, Lucene did not have this yet. I assume the Lucene HNSW implementation is solid, but would not be surprised if it's slower than the C/C++ based implementation, given the JVM has some disadvantages for these kinds of CPU-bound/number crunching algos. - I just haven't had much time to invest into my benchmark recently. In particular, I got stuck on why indexing was taking extremely long. Just indexing the vectors would have easily exceeded the current time limitations in the ANN-benchmarks project. Maybe I had some naive mistake in my implementation, but I profiled and dug pretty deep to make it fast.
I am trying to get Julie's branch running https://github.com/jtibshirani/lucene/tree/hnsw-bench Maybe this will help and is comparable
I'm assuming you want to use Lucene, but not necessarily via Elasticsearch?
Yes, for more simple setups I would like to use Lucene standalone, but for setups which have to scale I would use either Elasticsearch or Solr.
Thanks Michael
If so, another option you might try for ANN is the elastiknn-models and elastiknn-lucene packages. elastiknn-models contains the Locality Sensitive Hashing implementations of ANN used by Elastiknn, and elastiknn-lucene contains the Lucene queries used by Elastiknn.The Lucene query is the MatchHashesAndScoreQuery <https://github.com/alexklibisz/elastiknn/blob/master/elastiknn-lucene/src/main/java/org/apache/lucene/search/MatchHashesAndScoreQuery.java#L18-L22>. There are a couple of scala test suites that show how to use it: MatchHashesAndScoreQuerySuite <https://github.com/alexklibisz/elastiknn/blob/master/elastiknn-testing/src/test/scala/com/klibisz/elastiknn/query/MatchHashesAndScoreQuerySuite.scala>. MatchHashesAndScoreQueryPerformanceSuite <https://github.com/alexklibisz/elastiknn/blob/master/elastiknn-testing/src/test/scala/com/klibisz/elastiknn/query/MatchHashesAndScoreQueryPerformanceSuite.scala>. This is all designed to work independently from Elasticsearch and is published on Maven: com.klibisz.elastiknn / lucene <https://search.maven.org/artifact/com.klibisz.elastiknn/lucene/7.12.1.0/jar> and com.klibisz.elastiknn / models <https://search.maven.org/artifact/com.klibisz.elastiknn/models/7.12.1.0/jar>. The tests are Scala but all of the implementation is in Java. Thanks, Alex
--------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org