kaivalnp opened a new pull request, #15784: URL: https://github.com/apache/lucene/pull/15784
### Description Lucene added support for similarity-based vector searches in #12679, which is a vector query with a goal of introducing _all_ results above a vector similarity score threshold (= `resultSimilarity`) to the query vector (as opposed to a KNN query, with a goal of introducing the `topK` highest scoring results to the query). `[Byte|Float]VectorSimilarityQuery` provides an approximate search for this^, which uses a [special collector](https://github.com/apache/lucene/blob/f021aa55853c8b446404c8616ec247027774ae07/lucene/core/src/java/org/apache/lucene/search/VectorSimilarityCollector.java#L27) to traverse and collect results from existing HNSW graphs. The search algorithm in upper levels of the HNSW graph is the same as KNN -- which finds the single best entry point for actual search in the last layer. In the last layer: starting with the entry node, all nodes having a score above a user-specified `traversalSimilarity` are traversed, and all traversed nodes having a score above `resultSimilarity` are collected as results. To protect against the adversarial case of the entry node lying outside `traversalSimilarity`, it has an additional clause that continues traversal until better scoring nodes are available (i.e. the search moves towards the vicinity of the query). However, this clause is susceptible to being caught in a local maximum, and search terminating before reaching near the query. Another hassle is the determination of `traversalSimilarity` for an ideal recall v/s latency tradeoff -- where some queries in sparse spaces need a larger buffer, which is unnecessary in denser spaces. To counter both of these: proposing to make the graph traversal similarity adaptive -- starting with a low value, and moving towards `resultSimilarity` with an exponential decay on encountering low-scoring nodes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
