Alternatively, would it make sense for overwrite=false to _skip_ the
UpdateLog if it is present (and assuming you're not using CDCR since that's
based on the UpdateLog)?  I don't know.

On Thu, Sep 20, 2018 at 3:18 PM David Smiley <[email protected]>
wrote:

> Is it even sensible to want overwrite=false and have an UpdateLog?  That
> is, isn't the weight of the UpdateLog well more than whatever savings are
> had with overwrite=false?  I suspect that the combinbing these two today
> has edge cases we don't even realize, despite the apparent lack of
> exceptions.
>
> On Thu, Sep 20, 2018 at 3:11 PM Yonik Seeley <[email protected]> wrote:
>
>> On Thu, Sep 20, 2018 at 2:18 PM David Smiley <[email protected]>
>> wrote:
>>
>>> Thanks for the context.
>>>
>>> I'd like to do a few things:
>>> * Document that "overwrite" is a (potentially dangerous) peformance hack
>>> for documents that are assumed to be already unique.  It is not to be used
>>> to deliberately violate the uniqueKey constraint; this is considered
>>> erroneous and unsupported use.
>>>
>>
>> +1
>>
>>
>>> * Document *and enforce* that "overwrite" does not work with the
>>> UpdateLog.  User error; let them know.  While we maintain the UpdateLog, I
>>> don't want to have the complexity burden of considering how to support
>>> overwrite=true.  I'm not saying it's super complex, only that UpdateLog is
>>> already complex and I don't think the value of overwrite is good enough for
>>> me to want to maintain the two together.  I hope others can appreciate this
>>> point; I'm don't wish to be difficult.  If someone volunteers to make it
>>> work in a way that isn't complex then go for it.  *It appears it might
>>> work today but I wish to break this.*
>>>
>>
>> Throwing an exception for overwrite=false when using UpdateLog will
>> impact some users though (back compat break and performance regression), so
>> I'd like to understand what we gain, and if it's enough of a gain to
>> compensate.  If updating nested docs doesn't immediately implement
>> overwrite=false, it doesn't seem like a big deal.  Prohibiting it from ever
>> working on the other hand, doesn't seem like the right trade-off.
>>
>>
>>> * Consequently, ConvertedLegacyTest needs fixing.  If the intent of the
>>> legacy test is to see that the document can be added and violate the
>>> uniqueKey, then this test needs to use a config without the UpdateLog.  Or
>>> we keep the default config (with UpdateLog) and adjust the test's
>>> expectations.
>>>
>>
>> +1
>>
>> -Yonik
>>
>>
>>>
>>> On Thu, Sep 20, 2018 at 1:42 PM Yonik Seeley <[email protected]> wrote:
>>>
>>>> Yep, It's only for performance. I know a number of people using
>>>> overwrite=false when doing bulk indexing, and then often later using normal
>>>> adds for incremental changes.
>>>>
>>>> As far as why "overwrite(Pending|Committed)?" exists at all: it's been
>>>> there since Solr was open sourced (SOLR-1), so there wouldn't be a
>>>> discussion to find.  Lucene had no concept of unique IDs or overwriting at
>>>> the time and it was all implemented in Solr-land.  The cost to enforce was
>>>> significant (and still can be today), and often unneeded when building an
>>>> index from a source known to have unique IDs already.
>>>>
>>>> -Yonik
>>>>
>>>> --
>>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>>> http://www.solrenterprisesearchserver.com
>>>
>> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Reply via email to