>From the vector search side of things, nothing immediately pops up as a cause. https://lucene.apache.org/core/9_11_0/changes/Changes.html
The given query is just a regular kNN query. So, its rewrite should behave similarly as it did in 9.10. One significant change for kNN search behavior did happen in 9.10: https://github.com/apache/lucene/pull/12962 But since this issue doesn't happen in 9.10, I am at a loss. Since `knn` rewrites itself to `KnnScoreDoc` object, It's surprising that the result set should change between collecting and scoring. I wonder if Solr adjusted due to this deprecation or started using collector managers and inadvertently tripped over a bug or something? Or, something was added in Apache Lucene 9.11 where the same knn query over the same index could result in a different set of top-k docs. Though, I would have thought the main candidate there would be: https://github.com/apache/lucene/pull/12962 (in lucene 9.10). On Thu, Jan 23, 2025 at 3:46 AM Moll, Dr. Andreas <m...@juris.de.invalid> wrote: > Hi, > > I want to inform you about a behavior change in SolR 9.6 (Lucene 9.10) vs. > SolR 9.7 (Lucene 9.11) for vector searches. > > We heavily rely on vector searches for embeddings in combination with > filter queries on the parent documents. > > Our queries in general looked like this: > > select?q={ knn f=vector topK=2048}[...] > > rows=100 > > fq={ child of='childtype:root'}… > start=0 > > sort=score desc,ID desc > > With SolR 9.7 and higher, this results in ~10% of the queries producing > the following error: > > java.lang.IllegalArgumentException: Doc id 27227879 doesn't match the query > > at > org.apache.lucene.search.TopFieldCollector.populateScores(TopFieldCollector.java:478) > ~[?:?] > > at > org.apache.solr.search.SolrIndexSearcher.populateScoresIfNeeded(SolrIndexSearcher.java:1812) > ~[?:?] > > at > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:2001) > ~[?:?] > > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1775) > ~[?:?] > > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:772) > ~[?:?] > > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:767) > ~[?:?] > > After several days of debugging, I confirmed that the number of errors > correlates to the topK value: > > - k = 8 -> 44 errors > - k = 2048 -> 17 errors > - k = 16384 -> 1 error > > I found a workaround for the issue by modifying the sort parameter to: > > sort=score desc > > With this change, our queries work like a charm again. The initial thought > of adding the ID desc sorting was to get more reproducible results, but > it is not strictly necessary for us. > > Could you clarify if this change in SolR/Lucene was intended? If so, > perhaps you want to add documentation on vector queries that adding an > additional sorting might cause errors. > > Best regards, > Dr. Andreas Moll > > >