Is it even sensible to want overwrite=false and have an UpdateLog?  That
is, isn't the weight of the UpdateLog well more than whatever savings are
had with overwrite=false?  I suspect that the combinbing these two today
has edge cases we don't even realize, despite the apparent lack of
exceptions.

On Thu, Sep 20, 2018 at 3:11 PM Yonik Seeley <[email protected]> wrote:

> On Thu, Sep 20, 2018 at 2:18 PM David Smiley <[email protected]>
> wrote:
>
>> Thanks for the context.
>>
>> I'd like to do a few things:
>> * Document that "overwrite" is a (potentially dangerous) peformance hack
>> for documents that are assumed to be already unique.  It is not to be used
>> to deliberately violate the uniqueKey constraint; this is considered
>> erroneous and unsupported use.
>>
>
> +1
>
>
>> * Document *and enforce* that "overwrite" does not work with the
>> UpdateLog.  User error; let them know.  While we maintain the UpdateLog, I
>> don't want to have the complexity burden of considering how to support
>> overwrite=true.  I'm not saying it's super complex, only that UpdateLog is
>> already complex and I don't think the value of overwrite is good enough for
>> me to want to maintain the two together.  I hope others can appreciate this
>> point; I'm don't wish to be difficult.  If someone volunteers to make it
>> work in a way that isn't complex then go for it.  *It appears it might
>> work today but I wish to break this.*
>>
>
> Throwing an exception for overwrite=false when using UpdateLog will impact
> some users though (back compat break and performance regression), so I'd
> like to understand what we gain, and if it's enough of a gain to
> compensate.  If updating nested docs doesn't immediately implement
> overwrite=false, it doesn't seem like a big deal.  Prohibiting it from ever
> working on the other hand, doesn't seem like the right trade-off.
>
>
>> * Consequently, ConvertedLegacyTest needs fixing.  If the intent of the
>> legacy test is to see that the document can be added and violate the
>> uniqueKey, then this test needs to use a config without the UpdateLog.  Or
>> we keep the default config (with UpdateLog) and adjust the test's
>> expectations.
>>
>
> +1
>
> -Yonik
>
>
>>
>> On Thu, Sep 20, 2018 at 1:42 PM Yonik Seeley <[email protected]> wrote:
>>
>>> Yep, It's only for performance. I know a number of people using
>>> overwrite=false when doing bulk indexing, and then often later using normal
>>> adds for incremental changes.
>>>
>>> As far as why "overwrite(Pending|Committed)?" exists at all: it's been
>>> there since Solr was open sourced (SOLR-1), so there wouldn't be a
>>> discussion to find.  Lucene had no concept of unique IDs or overwriting at
>>> the time and it was all implemented in Solr-land.  The cost to enforce was
>>> significant (and still can be today), and often unneeded when building an
>>> index from a source known to have unique IDs already.
>>>
>>> -Yonik
>>>
>>> --
>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>> http://www.solrenterprisesearchserver.com
>>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Reply via email to