Alternatively, would it make sense for overwrite=false to _skip_ the UpdateLog if it is present (and assuming you're not using CDCR since that's based on the UpdateLog)? I don't know.
On Thu, Sep 20, 2018 at 3:18 PM David Smiley <[email protected]> wrote: > Is it even sensible to want overwrite=false and have an UpdateLog? That > is, isn't the weight of the UpdateLog well more than whatever savings are > had with overwrite=false? I suspect that the combinbing these two today > has edge cases we don't even realize, despite the apparent lack of > exceptions. > > On Thu, Sep 20, 2018 at 3:11 PM Yonik Seeley <[email protected]> wrote: > >> On Thu, Sep 20, 2018 at 2:18 PM David Smiley <[email protected]> >> wrote: >> >>> Thanks for the context. >>> >>> I'd like to do a few things: >>> * Document that "overwrite" is a (potentially dangerous) peformance hack >>> for documents that are assumed to be already unique. It is not to be used >>> to deliberately violate the uniqueKey constraint; this is considered >>> erroneous and unsupported use. >>> >> >> +1 >> >> >>> * Document *and enforce* that "overwrite" does not work with the >>> UpdateLog. User error; let them know. While we maintain the UpdateLog, I >>> don't want to have the complexity burden of considering how to support >>> overwrite=true. I'm not saying it's super complex, only that UpdateLog is >>> already complex and I don't think the value of overwrite is good enough for >>> me to want to maintain the two together. I hope others can appreciate this >>> point; I'm don't wish to be difficult. If someone volunteers to make it >>> work in a way that isn't complex then go for it. *It appears it might >>> work today but I wish to break this.* >>> >> >> Throwing an exception for overwrite=false when using UpdateLog will >> impact some users though (back compat break and performance regression), so >> I'd like to understand what we gain, and if it's enough of a gain to >> compensate. If updating nested docs doesn't immediately implement >> overwrite=false, it doesn't seem like a big deal. Prohibiting it from ever >> working on the other hand, doesn't seem like the right trade-off. >> >> >>> * Consequently, ConvertedLegacyTest needs fixing. If the intent of the >>> legacy test is to see that the document can be added and violate the >>> uniqueKey, then this test needs to use a config without the UpdateLog. Or >>> we keep the default config (with UpdateLog) and adjust the test's >>> expectations. >>> >> >> +1 >> >> -Yonik >> >> >>> >>> On Thu, Sep 20, 2018 at 1:42 PM Yonik Seeley <[email protected]> wrote: >>> >>>> Yep, It's only for performance. I know a number of people using >>>> overwrite=false when doing bulk indexing, and then often later using normal >>>> adds for incremental changes. >>>> >>>> As far as why "overwrite(Pending|Committed)?" exists at all: it's been >>>> there since Solr was open sourced (SOLR-1), so there wouldn't be a >>>> discussion to find. Lucene had no concept of unique IDs or overwriting at >>>> the time and it was all implemented in Solr-land. The cost to enforce was >>>> significant (and still can be today), and often unneeded when building an >>>> index from a source known to have unique IDs already. >>>> >>>> -Yonik >>>> >>>> -- >>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker >>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: >>> http://www.solrenterprisesearchserver.com >>> >> -- > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker > LinkedIn: http://linkedin.com/in/davidwsmiley | Book: > http://www.solrenterprisesearchserver.com > -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
