@Erick I know all that. My point is, lucene is NRT, while GET is RT (in both ES/SOLR). How does lucene return the right document (Term Query) before doing a commit on GET ?
On Sat, Apr 29, 2017 at 6:48 PM, Erick Erickson <[email protected]> wrote: > Internal magic. This is the point behind the admonitions against > relying on the _internal_ Lucene doc ID for anything... it changes. > For the same document. Even if the document doesn't change. > > Essentially when two segments are merged (and this is conceptual, not > the code) all the live docs from the first segment are written into > the new segments starting at 0 (or 1, like I said this is conceptual). > This (plus the base for the segment) is the internal lucene doc id. > When the first segment is written, say the last internal id is 100, > then the first live doc of the next segment gets 101. etc. > > Now here's the magic. The <unkqueKey> in Solr is just some random > field as far as Lucene is concerned. Solr knows what field this is. So > when you try to, say, update a doc with a <unkqueKey>, _Solr_ looks > around and asks Lucene "do you have any docs with this value in this > field". If the answer is yes, then Solr has the internal Lucene doc ID > and can do the right thing. So for atomic updates, Solr gets the right > Lucene doc, pulls all the stored data from the internal Lucene doc and > constructs another brand-new Lucene doc and applies the updates > specified. Then Solr tells Lucene to delete the doc with the internal > Lucene doc ID and then tells Lucene to index this new doc. Lucene > assigns it a new internal Lucene doc ID. > > > Best, > Erick > > On Sat, Apr 29, 2017 at 5:38 AM, Dorian Hoxha <[email protected]> > wrote: > > Hi Shawn, > > > > ES has the same thing. You're right that no 'id' is needed when adding a > > document. > > > > But lucene has updateDocument: > > https://lucene.apache.org/core/6_4_0/core/org/apache/ > lucene/index/IndexWriter.html#updateDocument-org.apache. > lucene.index.Term-java.lang.Iterable- > > ? > > > > Or that doesn't need a commit before deleting docs by Term, so it can > > atomically delete+insert ? > > > > So how do solr/es connect the internal docId to the outside primary-key > (to > > support get,delete) ? Or by just storing/indexing the primary-key in a > field > > ? > > > > Thanks, > > Dorian > > > > > > On Fri, Apr 28, 2017 at 10:34 PM, Shawn Heisey <[email protected]> > wrote: > >> > >> On 4/28/2017 6:16 AM, Dorian Hoxha wrote: > >> > I searched for this on mailing-list,issues etc, but couldn't find any > >> > post. > >> > > >> > So, why not have the possibility of <composite_id> ? > >> > Or nobody cared enough to implement it ? Or no gains ? > >> > >> To my knowledge, and I hope someone can correct me if I'm wrong, Lucene > >> generally has absolutely no concept of a primary key at all, much less > >> one that's composite. At its core, Lucene won't complain if you index > >> the same document twice -- both copies will be present. > >> > >> Solr (and probably a LOT of user-written Lucene code before that) > >> introduced the concept of a uniqueKey field. When a duplicate document > >> is indexed to Solr, it is Solr that finds/deletes the original, not > >> Lucene. I feel quite confident in saying that ES has the same > >> functionality, though I have not confirmed it. > >> > >> Thanks, > >> Shawn > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] > >> For additional commands, e-mail: [email protected] > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
