Here's a couple of fragments, alter to suit.... public void doRemove(Directory dir) throws Exception {
this.reader = IndexReader.open(dir); TermEnum theTerms = this.reader.terms(new Term("unique_field", "")); Term term = null; do { term = theTerms.term(); if ((term == null) || ! term.field().equalsIgnoreCase("doc_id")) { break; } if (theTerms.docFreq() > 1) { this.removeDupsForTerm(term); } } while (theTerms.next()); } private void removeDupsForTerm(Term term) throws Exception { TermDocs td = this.reader.termDocs(term); for ( int idx = 0; td.next(); ++idx) { if (idx > 0) { this.reader.deleteDocument(td.doc()); } } } On 10/2/07, Johnny R. Ruiz III <[EMAIL PROTECTED]> wrote: > > Hi Daniel, > > Tnx, but forgive my ignorance.. can u give me a sample code to do it > :). I have never used termDocs() before. > > Tnx, > Johnny > > ----- Original Message ---- > From: Daniel Noll <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Tuesday, October 2, 2007 12:00:07 PM > Subject: Re: Index Dedupe > > On Tuesday 02 October 2007 12:25:47 Johnny R. Ruiz III wrote: > > Hi, > > > > I can't seem to find a way to delete duplicate in lucene index. I > hve a > > unique key so it seems to be straight forward. But I can't find a > simple > > way to do it except for putting each record in the index into HashMap. > > Are there any method in lucene package that I could use? > > I would use termDocs() to iterate all the terms in that field. Then skip > the > first doc for each term and delete all subsequent ones. > > Daniel > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > ____________________________________________________________________________________ > Need a vacation? Get great deals > to amazing places on Yahoo! Travel. > http://travel.yahoo.com/