On Thu, Sep 20, 2018 at 2:18 PM David Smiley <[email protected]>
wrote:

> Thanks for the context.
>
> I'd like to do a few things:
> * Document that "overwrite" is a (potentially dangerous) peformance hack
> for documents that are assumed to be already unique.  It is not to be used
> to deliberately violate the uniqueKey constraint; this is considered
> erroneous and unsupported use.
>

+1


> * Document *and enforce* that "overwrite" does not work with the
> UpdateLog.  User error; let them know.  While we maintain the UpdateLog, I
> don't want to have the complexity burden of considering how to support
> overwrite=true.  I'm not saying it's super complex, only that UpdateLog is
> already complex and I don't think the value of overwrite is good enough for
> me to want to maintain the two together.  I hope others can appreciate this
> point; I'm don't wish to be difficult.  If someone volunteers to make it
> work in a way that isn't complex then go for it.  *It appears it might
> work today but I wish to break this.*
>

Throwing an exception for overwrite=false when using UpdateLog will impact
some users though (back compat break and performance regression), so I'd
like to understand what we gain, and if it's enough of a gain to
compensate.  If updating nested docs doesn't immediately implement
overwrite=false, it doesn't seem like a big deal.  Prohibiting it from ever
working on the other hand, doesn't seem like the right trade-off.


> * Consequently, ConvertedLegacyTest needs fixing.  If the intent of the
> legacy test is to see that the document can be added and violate the
> uniqueKey, then this test needs to use a config without the UpdateLog.  Or
> we keep the default config (with UpdateLog) and adjust the test's
> expectations.
>

+1

-Yonik


>
> On Thu, Sep 20, 2018 at 1:42 PM Yonik Seeley <[email protected]> wrote:
>
>> Yep, It's only for performance. I know a number of people using
>> overwrite=false when doing bulk indexing, and then often later using normal
>> adds for incremental changes.
>>
>> As far as why "overwrite(Pending|Committed)?" exists at all: it's been
>> there since Solr was open sourced (SOLR-1), so there wouldn't be a
>> discussion to find.  Lucene had no concept of unique IDs or overwriting at
>> the time and it was all implemented in Solr-land.  The cost to enforce was
>> significant (and still can be today), and often unneeded when building an
>> index from a source known to have unique IDs already.
>>
>> -Yonik
>>
>> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>

Reply via email to