Hi, Thanks for the help, just a few more questions:
On 1/26/06, Paul Elschot <[EMAIL PROTECTED]> wrote: > On Thursday 26 January 2006 09:15, Chun Wei Ho wrote: > > I am attempting to prune an index by getting each document in turn and > > then checking/deleting it: > > > > IndexReader ir = IndexReader.open(path); > > for(int i=0;i<ir.numDocs();i++) { > > Document doc = ir.document(i); > > if(thisDocShouldBeDeleted(doc)) { > > ir.delete(docNum); // <- I need the docNum for doc. > > } > > } > > > > How do I get the docNum for IndexReader.delete() function in the above > > case? Is there a API function I am missing? I am working with a merged > > The document number is the variable i in this case. If the document number is the variable i (enumerated from numDocs()), what's the difference between numDocs() and maxDoc() in this case? I was previously under the impression that the internal docNum might be different to the counter. > > index over different segments so the docNum might not be in running > > sequence with the counter i. > > In general, is there a better way to do this sort of thing? > > This code: > > Document doc = ir.document(i); > > normally retrieves all the stored fields of the document and that is > quite costly. In case you know that the document(s) to be deleted > match(es) a Term, it's better to use IndexReader.delete(Term). I'm doing something akin to a rangeQuery, where I delete documents within a certain range (in addition to other criteria). Is it better to do a query on the range, mark all the docNums getting them with Hits.id(), and then retrieve docs and test for deletion according to that? Thanks for the help --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]