Re: IndexWriter.deleteDocuments(Query query)

John Wang Wed, 01 Apr 2009 14:22:56 -0700

Hi Michael:

    1) Yes, we use TermDocs, exactly what IndexWriter.deleteDocuments(Term)
is doing under the cover.
    2) We iterate the docid->uid mapping, for each docid, get the
corresponding ui and check that to see if that is in the deleted set. If so,
add the docid to the list. There is no uid->docid lookup needed.


      However, in our sharded architecture, we partition by continuous uids,
in which case we keep both mappings since we know the range of the the uid.
In which case, uid->docid mapping is available.

-John

On Wed, Apr 1, 2009 at 11:27 AM, Michael McCandless <
[email protected]> wrote:

> On Wed, Apr 1, 2009 at 2:04 PM, John Wang <[email protected]> wrote:
>
> > My test essentially this. I took out the reader.deleteDocuments call from
> > both scenarios. I took a index of 5m docs. a batch of 10000 randomly
> > generated uids.
> >
> > Compared the following scenarios:
> > 1)
> > * open index reader
> > * for each uid in the batch, find the corresponding docid and add to an
> > IntList.
> > *close reader
>
> How exactly do you find the corresponding docid?  TermDocs?
>
> > 2)
> > * open index reader
> > * load uid array from payload field
> > * iterate uid array, and check to see if uid is in deleted set, and add
> to
> > an IntList
>
> In this case, each doc has a dedicated field that only has a payload
> that stores the one uid for that doc?  But I'm confused how you then
> map from uid -> docID.  I must be missing something.
>
> > The datastructure holding deleted set is IntOpenHashSet from fastutil.
> >
> > 1) took about 3500 - 4500 ms
> > 2) took about 815 ms
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: IndexWriter.deleteDocuments(Query query)

Reply via email to