>From the vector search side of things, nothing immediately pops up as a
cause. https://lucene.apache.org/core/9_11_0/changes/Changes.html

The given query is just a regular kNN query. So, its rewrite should behave
similarly as it did in 9.10.

One significant change for kNN search behavior did happen in 9.10:
https://github.com/apache/lucene/pull/12962 But since this issue doesn't
happen in 9.10, I am at a loss.

Since `knn` rewrites itself to `KnnScoreDoc` object, It's surprising that
the result set should change between collecting and scoring.

I wonder if Solr adjusted due to this deprecation or started using
collector managers and inadvertently tripped over a bug or something?

Or, something was added in Apache Lucene 9.11 where the same knn query over
the same index could result in a different set of top-k docs. Though, I
would have thought the main candidate there would be:
https://github.com/apache/lucene/pull/12962 (in lucene 9.10).

On Thu, Jan 23, 2025 at 3:46 AM Moll, Dr. Andreas <m...@juris.de.invalid>
wrote:

> Hi,
>
> I want to inform you about a behavior change in SolR 9.6 (Lucene 9.10) vs.
> SolR 9.7 (Lucene 9.11) for vector searches.
>
> We heavily rely on vector searches for embeddings in combination with
> filter queries on the parent documents.
>
> Our queries in general looked like this:
>
> select?q={ knn f=vector topK=2048}[...]
>
> rows=100
>
> fq={ child of='childtype:root'}…
> start=0
>
> sort=score desc,ID desc
>
> With SolR 9.7 and higher, this results in ~10% of the queries producing
> the following error:
>
> java.lang.IllegalArgumentException: Doc id 27227879 doesn't match the query
>
>         at
> org.apache.lucene.search.TopFieldCollector.populateScores(TopFieldCollector.java:478)
> ~[?:?]
>
>         at
> org.apache.solr.search.SolrIndexSearcher.populateScoresIfNeeded(SolrIndexSearcher.java:1812)
> ~[?:?]
>
>         at
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:2001)
> ~[?:?]
>
>         at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1775)
> ~[?:?]
>
>         at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:772)
> ~[?:?]
>
>         at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:767)
> ~[?:?]
>
> After several days of debugging, I confirmed that the number of errors
> correlates to the topK value:
>
>    - k = 8 -> 44 errors
>    - k = 2048 -> 17 errors
>    - k = 16384 -> 1 error
>
> I found a workaround for the issue by modifying the sort parameter to:
>
> sort=score desc
>
> With this change, our queries work like a charm again. The initial thought
> of adding the ID desc sorting was to get more reproducible results, but
> it is not strictly necessary for us.
>
> Could you clarify if this change in SolR/Lucene was intended? If so,
> perhaps you want to add documentation on vector queries that adding an
> additional sorting might cause errors.
>
> Best regards,
> Dr. Andreas Moll
>
>
>

Reply via email to