On Thu, Sep 20, 2018 at 2:18 PM David Smiley <[email protected]> wrote:
> Thanks for the context. > > I'd like to do a few things: > * Document that "overwrite" is a (potentially dangerous) peformance hack > for documents that are assumed to be already unique. It is not to be used > to deliberately violate the uniqueKey constraint; this is considered > erroneous and unsupported use. > +1 > * Document *and enforce* that "overwrite" does not work with the > UpdateLog. User error; let them know. While we maintain the UpdateLog, I > don't want to have the complexity burden of considering how to support > overwrite=true. I'm not saying it's super complex, only that UpdateLog is > already complex and I don't think the value of overwrite is good enough for > me to want to maintain the two together. I hope others can appreciate this > point; I'm don't wish to be difficult. If someone volunteers to make it > work in a way that isn't complex then go for it. *It appears it might > work today but I wish to break this.* > Throwing an exception for overwrite=false when using UpdateLog will impact some users though (back compat break and performance regression), so I'd like to understand what we gain, and if it's enough of a gain to compensate. If updating nested docs doesn't immediately implement overwrite=false, it doesn't seem like a big deal. Prohibiting it from ever working on the other hand, doesn't seem like the right trade-off. > * Consequently, ConvertedLegacyTest needs fixing. If the intent of the > legacy test is to see that the document can be added and violate the > uniqueKey, then this test needs to use a config without the UpdateLog. Or > we keep the default config (with UpdateLog) and adjust the test's > expectations. > +1 -Yonik > > On Thu, Sep 20, 2018 at 1:42 PM Yonik Seeley <[email protected]> wrote: > >> Yep, It's only for performance. I know a number of people using >> overwrite=false when doing bulk indexing, and then often later using normal >> adds for incremental changes. >> >> As far as why "overwrite(Pending|Committed)?" exists at all: it's been >> there since Solr was open sourced (SOLR-1), so there wouldn't be a >> discussion to find. Lucene had no concept of unique IDs or overwriting at >> the time and it was all implemented in Solr-land. The cost to enforce was >> significant (and still can be today), and often unneeded when building an >> index from a source known to have unique IDs already. >> >> -Yonik >> >> -- > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker > LinkedIn: http://linkedin.com/in/davidwsmiley | Book: > http://www.solrenterprisesearchserver.com >
