Re: Document ID shuffling under 2.3.x (on merge?)

Erick Erickson Tue, 11 Mar 2008 15:54:30 -0700

But to me, it always seems...er...fraught to even *think* about relying
on doc ids. I know you've been around the block with Lucene, but do you
have a compelling reason to use the doc ID and not your own unique ID?


Best
Erick

On Tue, Mar 11, 2008 at 5:39 PM, Daniel Noll <[EMAIL PROTECTED]> wrote:

> On Tuesday 11 March 2008 19:55:39 Michael McCandless wrote:
> > Hi Daniel,
> >
> > 2.3 should be no different from 2.2 in that docIDs only "shift" when
> > a merge of segments with deletions completes.
> >
> > Could it be the ConcurrentMergeScheduler?  Merges now run in the
> > background by default and commit whenever they complete.  You can get
> > back to the previous (blocking) behavior by using
> > SerialMergeScheduler instead.
>
> That was my first thought, but SerialMergeScheduler doesn't cause the
> problem.
> I've done a little more investigation since; it turns out that if I don't
> call optimize() then the problem doesn't occur.
>
> Could it be that optimize(int,boolean) is storing the segments to optimise
> in
> a HashSet, which by its nature reorders the segments?
>
> > If it's not that ... can you provide more details about how your
> > applications is relying on docIDs?
>
> As far as that, we assume that if there are N documents in the index then
> the
> next document ID will be N (we determine this before adding the document.)
> As we're only doing this in a single thread and we never delete documents,
> this was previously safe.
>
> Daniel
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Re: Document ID shuffling under 2.3.x (on merge?)

Reply via email to