Is it even sensible to want overwrite=false and have an UpdateLog? That is, isn't the weight of the UpdateLog well more than whatever savings are had with overwrite=false? I suspect that the combinbing these two today has edge cases we don't even realize, despite the apparent lack of exceptions.
On Thu, Sep 20, 2018 at 3:11 PM Yonik Seeley <[email protected]> wrote: > On Thu, Sep 20, 2018 at 2:18 PM David Smiley <[email protected]> > wrote: > >> Thanks for the context. >> >> I'd like to do a few things: >> * Document that "overwrite" is a (potentially dangerous) peformance hack >> for documents that are assumed to be already unique. It is not to be used >> to deliberately violate the uniqueKey constraint; this is considered >> erroneous and unsupported use. >> > > +1 > > >> * Document *and enforce* that "overwrite" does not work with the >> UpdateLog. User error; let them know. While we maintain the UpdateLog, I >> don't want to have the complexity burden of considering how to support >> overwrite=true. I'm not saying it's super complex, only that UpdateLog is >> already complex and I don't think the value of overwrite is good enough for >> me to want to maintain the two together. I hope others can appreciate this >> point; I'm don't wish to be difficult. If someone volunteers to make it >> work in a way that isn't complex then go for it. *It appears it might >> work today but I wish to break this.* >> > > Throwing an exception for overwrite=false when using UpdateLog will impact > some users though (back compat break and performance regression), so I'd > like to understand what we gain, and if it's enough of a gain to > compensate. If updating nested docs doesn't immediately implement > overwrite=false, it doesn't seem like a big deal. Prohibiting it from ever > working on the other hand, doesn't seem like the right trade-off. > > >> * Consequently, ConvertedLegacyTest needs fixing. If the intent of the >> legacy test is to see that the document can be added and violate the >> uniqueKey, then this test needs to use a config without the UpdateLog. Or >> we keep the default config (with UpdateLog) and adjust the test's >> expectations. >> > > +1 > > -Yonik > > >> >> On Thu, Sep 20, 2018 at 1:42 PM Yonik Seeley <[email protected]> wrote: >> >>> Yep, It's only for performance. I know a number of people using >>> overwrite=false when doing bulk indexing, and then often later using normal >>> adds for incremental changes. >>> >>> As far as why "overwrite(Pending|Committed)?" exists at all: it's been >>> there since Solr was open sourced (SOLR-1), so there wouldn't be a >>> discussion to find. Lucene had no concept of unique IDs or overwriting at >>> the time and it was all implemented in Solr-land. The cost to enforce was >>> significant (and still can be today), and often unneeded when building an >>> index from a source known to have unique IDs already. >>> >>> -Yonik >>> >>> -- >> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker >> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: >> http://www.solrenterprisesearchserver.com >> > -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
