Re: Error Doc id doesn't match the query in vector searches

Varun Thacker Fri, 17 Jan 2025 14:13:20 -0800

Lastly, looks like there were some changes to the collector executor in
https://github.com/apache/solr/commit/7405bb19fa424ec79c6daaafd986670dc54d7dfe
that fixes the issue. I cannot repro the issue anymore


So we don't need to create a solr jira after all

On Fri, Jan 17, 2025 at 11:44 AM Varun Thacker <[email protected]> wrote:

> I was able to narrow down to
> https://github.com/apache/solr/commit/cfec121bab2ecfc4c06e20a5533596025ae63d98
> that causes this issue.
>
> Without that change the bug doesn't repro
>
> On Thu, Jan 16, 2025 at 6:49 PM Varun Thacker <[email protected]> wrote:
>
>> I misspoke, for regular search KnnFloatVectorQuery is the Query object
>> before the rewrite. After the
>> rewrite it's AbstractKnnVectorQuery$DocAndScoreQuery
>>
>> And then when Solr asks for the score the same Query object is passed to
>> the rewrite and becomes a AbstractKnnVectorQuery$DocAndScoreQuery
>>
>> I'll try looking with some fresh eyes tomorrow
>>
>> On Thu, Jan 16, 2025 at 6:00 PM Varun Thacker <[email protected]> wrote:
>>
>>> I'll have to recreate my setup again since I tried re-building solr
>>> without some PRs and it wiped everything out(my mistake!)
>>>
>>> I was able to get the query Solr sends for search KnnFloatVectorQuery vs
>>> what it uses for getting the
>>> score {AbstractKnnVectorQuery$DocAndScoreQuery. This might give some
>>> breadcrumbs to Mike while I try to look into it more tomorrow
>>>
>>>
>>> query = {KnnFloatVectorQuery@10048}
>>> "KnnFloatVectorQuery:value[0.1234,...][160000]"
>>>  target = {float[768]@10064} [...
>>>  field = "value"
>>>  k = 160000
>>>  filter = null
>>>  isDeprecatedRewriteMethodOverridden = false
>>>  CLASS_NAME_HASH = 1536329572
>>>
>>>  query = {AbstractKnnVectorQuery$DocAndScoreQuery@10074}
>>> "DocAndScore[160000]"
>>>  k = 160000
>>>  docs = {int[160000]@10084} [... more]
>>>  scores = {float[160000]@10085} [...]
>>>  contextIdentity = {Object@10087}
>>>  isDeprecatedRewriteMethodOverridden = false
>>>  CLASS_NAME_HASH = 1706435309
>>>
>>> On Thu, Jan 16, 2025 at 5:28 PM Varun Thacker <[email protected]> wrote:
>>>
>>>> I have an index where I can repro it with 100% success. Let me look
>>>> into what's causing it and create a Solr Jira
>>>>
>>>> On Mon, Oct 21, 2024 at 11:11 AM Michael Sokolov <[email protected]>
>>>> wrote:
>>>>
>>>>> I think this might be a better question for solr-user@? EG I don't
>>>>> understand how Solr decides which Query to send to populateScores --
>>>>> is it the same one that was used to generate the matches in topDocs?
>>>>> It seems as if it should be, but then this error shouldn't happen ...
>>>>> I wonder if you can print out the queries sent to search() and to
>>>>> populateScores()?
>>>>>
>>>>> On Thu, Oct 17, 2024 at 5:29 AM Moll, Dr. Andreas
>>>>> <[email protected]> wrote:
>>>>> >
>>>>> > Hi,
>>>>> >
>>>>> > we are currently testing Solr 9.7 and experiencing an error we have
>>>>> not seen before with SolR 9.6.1 and we think the problem might occur in 
>>>>> the
>>>>> underlying lucene code basis:
>>>>> >
>>>>> > ERROR o.a.s.h.RequestHandlerBase Server exception =>
>>>>> > at
>>>>> org.apache.lucene.search.TopFieldCollector.populateScores(TopFieldCollector.java:478)
>>>>> > java.lang.IllegalArgumentException: Doc id 48567944 doesn't match
>>>>> the query
>>>>> > at
>>>>> org.apache.lucene.search.TopFieldCollector.populateScores(TopFieldCollector.java:478)
>>>>> ~[?:?]
>>>>> > at
>>>>> org.apache.solr.search.SolrIndexSearcher.populateScoresIfNeeded(SolrIndexSearcher.java:1766)
>>>>> ~[?:?]
>>>>> > at
>>>>> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1955)
>>>>> ~[?:?]
>>>>> > at
>>>>> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1729)
>>>>> ~[?:?]
>>>>> > at
>>>>> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:726)
>>>>> ~[?:?]
>>>>> > at
>>>>> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:721)
>>>>> ~[?:?]
>>>>> > at
>>>>> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryComponent.java:1690)
>>>>> ~[?:?]
>>>>> > at
>>>>> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:432)
>>>>> ~[?:?]
>>>>> > at
>>>>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:456)
>>>>> ~[?:?]
>>>>> > at
>>>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:226)
>>>>> ~[?:?]
>>>>> >
>>>>> > We index the embeddings as nested fields and can reproduce the error
>>>>> with the following code:
>>>>> >
>>>>> > We are able to reproduce the error using the same query. It seems to
>>>>> occur in approximately 5% of all vector queries.
>>>>> > We have one server running Solr 9.7 and three servers running Solr
>>>>> 9.6.1, all working on the same frozen index.
>>>>> > Only the Solr 9.7 server encounters the issue. We can rule out Java
>>>>> 21 and the corresponding optimizations or the new multithreading parameter
>>>>> as the root cause of the problem.
>>>>> > The index contains the document referenced in the error message.
>>>>> >
>>>>> >
>>>>> > String q = "{!knn f=vector topK=20}[0.031046804...";
>>>>> > SolrQuery sq = new SolrQuery("{!cache=false}" + q);
>>>>> > sq.addField("score"); // no error without the score field
>>>>> > sq.setRows(14); // Defect document must be included in result
>>>>> > sq.setSort("ID", ORDER.asc); // Order is not important, but ID is.
>>>>> No error e.g. with score
>>>>> > final QueryRequest r = new QueryRequest(sq, METHOD.POST);
>>>>> > SolrClient solrClient = SolRConnector.createServer(server);
>>>>> > QueryResponse response = r.process(solrClient);
>>>>> >
>>>>> > Is there any additional information we can provide to help resolve
>>>>> this error?
>>>>> >
>>>>> > Best regards
>>>>> >
>>>>> > Andreas Moll
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>>
>>>>>

Re: Error Doc id doesn't match the query in vector searches

Reply via email to