Also as a workaround, I can confirm that setting indexSearcherExecutor to 0 from the Solr side as mentioned in https://issues.apache.org/jira/browse/SOLR-17642 mitigates the problem for now.
On Thu, Jan 30, 2025 at 10:40 AM Benjamin Trent <ben.w.tr...@gmail.com> wrote: > Yes, FYI, we found the bug in the kNN query > https://github.com/apache/lucene/issues/14180 > > Basically, threads sharing information back for graph early termination > can lead to inconsistency. We should fix this in Lucene. Though I do not > know the timeline or the simplicity. > > Thank you Dr. Andreas Moll for bringing this to our attention! > > On Thu, Jan 30, 2025 at 1:26 PM Varun Thacker <va...@vthacker.in> wrote: > >> Benjamin - I think this has to do with Solr 9.7+ using thread executor's >> for searching. >> >> I can take Solr 9.7 or Solr 9.8 and just undo this one line in >> SolrIndexSearcher >> <https://github.com/apache/solr/blob/7af2ad56753bf75b8391639233dcc8d465767de9/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L385> >> and >> the query doesn't fail >> >> - super(wrapReader(core, r), >> core.getCoreContainer().getIndexSearcherExecutor()); >> + super(wrapReader(core, r)); >> >> On Thu, Jan 23, 2025 at 6:49 AM Benjamin Trent <ben.w.tr...@gmail.com> >> wrote: >> >>> From the vector search side of things, nothing immediately pops up as a >>> cause. https://lucene.apache.org/core/9_11_0/changes/Changes.html >>> >>> The given query is just a regular kNN query. So, its rewrite should >>> behave similarly as it did in 9.10. >>> >>> One significant change for kNN search behavior did happen in 9.10: >>> https://github.com/apache/lucene/pull/12962 But since this issue >>> doesn't happen in 9.10, I am at a loss. >>> >>> Since `knn` rewrites itself to `KnnScoreDoc` object, It's surprising >>> that the result set should change between collecting and scoring. >>> >>> I wonder if Solr adjusted due to this deprecation or started using >>> collector managers and inadvertently tripped over a bug or something? >>> >>> Or, something was added in Apache Lucene 9.11 where the same knn query >>> over the same index could result in a different set of top-k docs. Though, >>> I would have thought the main candidate there would be: >>> https://github.com/apache/lucene/pull/12962 (in lucene 9.10). >>> >>> On Thu, Jan 23, 2025 at 3:46 AM Moll, Dr. Andreas <m...@juris.de.invalid> >>> wrote: >>> >>>> Hi, >>>> >>>> I want to inform you about a behavior change in SolR 9.6 (Lucene 9.10) >>>> vs. SolR 9.7 (Lucene 9.11) for vector searches. >>>> >>>> We heavily rely on vector searches for embeddings in combination with >>>> filter queries on the parent documents. >>>> >>>> Our queries in general looked like this: >>>> >>>> select?q={ knn f=vector topK=2048}[...] >>>> >>>> rows=100 >>>> >>>> fq={ child of='childtype:root'}… >>>> start=0 >>>> >>>> sort=score desc,ID desc >>>> >>>> With SolR 9.7 and higher, this results in ~10% of the queries producing >>>> the following error: >>>> >>>> java.lang.IllegalArgumentException: Doc id 27227879 doesn't match the >>>> query >>>> >>>> at >>>> org.apache.lucene.search.TopFieldCollector.populateScores(TopFieldCollector.java:478) >>>> ~[?:?] >>>> >>>> at >>>> org.apache.solr.search.SolrIndexSearcher.populateScoresIfNeeded(SolrIndexSearcher.java:1812) >>>> ~[?:?] >>>> >>>> at >>>> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:2001) >>>> ~[?:?] >>>> >>>> at >>>> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1775) >>>> ~[?:?] >>>> >>>> at >>>> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:772) >>>> ~[?:?] >>>> >>>> at >>>> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:767) >>>> ~[?:?] >>>> >>>> After several days of debugging, I confirmed that the number of errors >>>> correlates to the topK value: >>>> >>>> - k = 8 -> 44 errors >>>> - k = 2048 -> 17 errors >>>> - k = 16384 -> 1 error >>>> >>>> I found a workaround for the issue by modifying the sort parameter to: >>>> >>>> sort=score desc >>>> >>>> With this change, our queries work like a charm again. The initial >>>> thought of adding the ID desc sorting was to get more reproducible >>>> results, but it is not strictly necessary for us. >>>> >>>> Could you clarify if this change in SolR/Lucene was intended? If so, >>>> perhaps you want to add documentation on vector queries that adding an >>>> additional sorting might cause errors. >>>> >>>> Best regards, >>>> Dr. Andreas Moll >>>> >>>> >>>> >>>