After debugging a little I can confirm that the dedup is happening in QueryComponent.mergeIds.
Distributed search has always done an quick-n-dirty dedup (i.e. it's
considered an error condition to have the same ID in different shards
anyway).
Actually it is in the same shard we have two documents with the same ID. They are routed to the same shard because the have the same ID. Remember I tweek my request-params (basically setting overwrite=false) so that I end up with indexWriter.addDocument (for both documents) in DirectUpdateHandler2 instead of indexWriter.updateDocument

There is a little inconsistency. The dedup does not reflect on total numFound unless you actually happen to get the document(s) back in your query. Simple example: I have only two document in my entire collection (consisting of several shards). They both live in the same shard and have the same ID (actually they are complete duplicates). I get this funny behavior when searching * Searching with rows=0 or rows=1, I get the numFound=2 back - and in the case of rows=1 I get the document (or one of them) * Searching with rows>=2, I get numFound=1 back - and the document (or one of them)
It should be in QueryComponent.mergeIds

-Yonik


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to