On Fri, Oct 18, 2013 at 4:35 PM, Per Steffensen <[email protected]> wrote:
> It is a distributed search! So where in the code is the de-dup happening for
> distributed searches? And is it correct that this is new in 4.4.0 (vs
> 4.0.0)? Or did I just accidently change my config to turn it on.

Distributed search has always done an quick-n-dirty dedup (i.e. it's
considered an error condition to have the same ID in different shards
anyway).
It should be in QueryComponent.mergeIds

-Yonik



> Regards, Per Steffensen
>
>
> On 10/18/13 10:09 PM, Yonik Seeley wrote:
>>
>> AFAIK, the only dedup that is done on purpose is during distributed
>> search.
>> So either a distributed search is happening, or there has been some
>> other change that accidentally started de-duping (such as some sort of
>> map from ID to Doc for other reasons).
>>
>> -Yonik
>>
>>
>> On Fri, Oct 18, 2013 at 4:03 PM, Per Steffensen <[email protected]>
>> wrote:
>>>
>>> Hi
>>>
>>> I send update/add-requests to Solr in a way so that
>>> indexWriter.addDocument
>>> is used in DirectUpdateHandler2 instead of indexWriter.updateDocument. In
>>> two separate requests I send two identical documents into Solr. In Solr
>>> 4.0.0 I get both documents back when I search. In Solr 4.4.0 I only get
>>> one
>>> document back. I have investigated a little into what happens in Solr
>>> 4.4.0,
>>> and I believe I see that both documents actually in the Lucene indices
>>> (in
>>> QueryComponent.process the searcher.search line returns two docs for one
>>> of
>>> my shards). So it must be somewhere in the search-flow that it is decided
>>> to
>>> send only one of them back to the client. In Solr 4.0.0 I get both back
>>> to
>>> the client.
>>>
>>> Is this known/intended behavior? Can someone point me to the code where
>>> "duplicates" are filtered, and/or to the JIRA issue where this feature
>>> was
>>> introduced. Not that I necessarily want to do it, but can this
>>> searh-dedup
>>> be turned off?
>>>
>>> Regards, Per Steffensen
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to