On Thursday 26 January 2006 09:47, Chun Wei Ho wrote:
> Hi,
> 
> Thanks for the help, just a few more questions:
> 
> On 1/26/06, Paul Elschot <[EMAIL PROTECTED]> wrote:
> > On Thursday 26 January 2006 09:15, Chun Wei Ho wrote:
> > > I am attempting to prune an index by getting each document in turn and
> > > then checking/deleting it:
> > >
> > > IndexReader ir = IndexReader.open(path);
> > > for(int i=0;i<ir.numDocs();i++) {
> > >       Document doc = ir.document(i);
> > >       if(thisDocShouldBeDeleted(doc)) {
> > >               ir.delete(docNum); // <- I need the docNum for doc.
> > >       }
> > > }
> > >
> > > How do I get the docNum for IndexReader.delete() function in the above
> > > case? Is there a API function I am missing? I am working with a merged
> >
> > The document number is the variable i in this case.
> If the document number is the variable i (enumerated from numDocs()),
> what's the difference between numDocs() and maxDoc() in this case? I
> was previously under the impression that the internal docNum might be
> different to the counter.

Iirc, the difference between maxDoc() + 1 and numDocs() is the number of
deleted documents. Check the javadocs to be sure.

> 
> > > index over different segments so the docNum might not be in running
> > > sequence with the counter i.
> > > In general, is there a better way to do this sort of thing?
> >
> > This code:
> >
> >         Document doc = ir.document(i);
> >
> > normally retrieves all the stored fields of the document and that is
> > quite costly. In case you know that the document(s) to be deleted
> > match(es) a Term, it's better to use IndexReader.delete(Term).
> 
> I'm doing something akin to a rangeQuery, where I delete documents
> within a certain range (in addition to other criteria). Is it better
> to do a query on the range, mark all the docNums getting them with
> Hits.id(), and then retrieve docs and test for deletion according to
> that?

In that case it is faster to use the Terms generated inside the range query
and then use these on IndexReader.delete(Term).
To generate the terms have a look at the source code of the rewrite()
method of RangeQuery here:
http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/src/java/org/apache/lucene/search/

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to