@Erick

I know all that. My point is, lucene is NRT, while GET is RT (in both
ES/SOLR). How does lucene return the right document (Term Query) before
doing a commit on GET ?

On Sat, Apr 29, 2017 at 6:48 PM, Erick Erickson <[email protected]>
wrote:

> Internal magic. This is the point behind the admonitions against
> relying on the _internal_ Lucene doc ID for anything... it changes.
> For the same document. Even if the document doesn't change.
>
> Essentially when two segments are merged (and this is conceptual, not
> the code) all the live docs from the first segment are written into
> the new segments starting at 0 (or 1, like I said this is conceptual).
> This (plus the base for  the segment) is the internal lucene doc id.
> When the first segment is written, say the last internal id is 100,
> then the first live doc of the next segment gets 101. etc.
>
> Now here's the magic. The <unkqueKey> in Solr is just some random
> field as far as Lucene is concerned. Solr knows what field this is. So
> when you try to, say, update a doc with a <unkqueKey>, _Solr_ looks
> around and asks Lucene "do you have any docs with this value in this
> field". If the answer is yes, then Solr has the internal Lucene doc ID
> and can do the right thing. So for atomic updates, Solr gets the right
> Lucene doc, pulls all the stored data from the internal Lucene doc and
> constructs another brand-new Lucene doc and applies the updates
> specified. Then Solr tells Lucene to delete the doc with the internal
> Lucene doc ID and then tells Lucene to index this new doc. Lucene
> assigns it a new internal Lucene doc ID.
>
>
> Best,
> Erick
>
> On Sat, Apr 29, 2017 at 5:38 AM, Dorian Hoxha <[email protected]>
> wrote:
> > Hi Shawn,
> >
> > ES has the same thing. You're right that no 'id' is needed when adding a
> > document.
> >
> > But lucene has updateDocument:
> > https://lucene.apache.org/core/6_4_0/core/org/apache/
> lucene/index/IndexWriter.html#updateDocument-org.apache.
> lucene.index.Term-java.lang.Iterable-
> > ?
> >
> > Or that doesn't need a commit before deleting docs by Term, so it can
> > atomically delete+insert  ?
> >
> > So how do solr/es connect the internal docId to the outside primary-key
> (to
> > support get,delete) ? Or by just storing/indexing the primary-key in a
> field
> > ?
> >
> > Thanks,
> > Dorian
> >
> >
> > On Fri, Apr 28, 2017 at 10:34 PM, Shawn Heisey <[email protected]>
> wrote:
> >>
> >> On 4/28/2017 6:16 AM, Dorian Hoxha wrote:
> >> > I searched for this on mailing-list,issues etc, but couldn't find any
> >> > post.
> >> >
> >> > So, why not have the possibility of <composite_id> ?
> >> > Or nobody cared enough to implement it ? Or no gains ?
> >>
> >> To my knowledge, and I hope someone can correct me if I'm wrong, Lucene
> >> generally has absolutely no concept of a primary key at all, much less
> >> one that's composite.  At its core, Lucene won't complain if you index
> >> the same document twice -- both copies will be present.
> >>
> >> Solr (and probably a LOT of user-written Lucene code before that)
> >> introduced the concept of a uniqueKey field.  When a duplicate document
> >> is indexed to Solr, it is Solr that finds/deletes the original, not
> >> Lucene.  I feel quite confident in saying that ES has the same
> >> functionality, though I have not confirmed it.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to