iprithv opened a new pull request, #16047: URL: https://github.com/apache/lucene/pull/16047
## Description This implements a proof of concept for sibling scoring in block join diversified child vector search, discussed in [#15839 — Maybe Improve join block Vector search performance by block scoring child vectors](https://github.com/apache/lucene/issues/15839). Today, DiversifyingChildrenFloatKnnVectorQuery / DiversifyingChildrenByteKnnVectorQuery can return a child vector that reached HNSW first for a parent, even when another sibling in the same parent block has higher similarity. This PR adds a post HNSW rescoring pass on the approximate path only: for each provisional hit, iterate all live child doc ids in that parent’s block (in ascending doc order so VectorScorer stays forward‑only), re‑score with the real VectorScorer, keep the best child, then re‑sort by score. acceptDocs is respected. Exact search is unchanged. This is intentionally smaller than the issue’s exploratory design (collector driving scoring during traversal, visited tracking, richer aggregates like min/max/top‑n per parent). It is meant to demonstrate correctness within the block for parents already surfaced by HNSW, with measurable overhead. --- ## Benchmarks (JMH) | children / parent | Baseline | + block rescore | |-------------------|---------|----------------| | 16 | ~0.17 ms/op | ~0.22 ms/op | | 64 | ~0.26 ms/op | ~0.42 ms/op | Overhead **scales with block width** (and with `topK`, since more blocks are rescored). The change trades CPU for correctness within each parent block. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
