Here's a couple of fragments, alter to suit....
public void doRemove(Directory dir) throws Exception
{
this.reader = IndexReader.open(dir);
TermEnum theTerms = this.reader.terms(new Term("unique_field", ""));
Term term = null;
do {
term = theTerms.term();
if ((term == null) || ! term.field().equalsIgnoreCase("doc_id"))
{
break;
}
if (theTerms.docFreq() > 1) {
this.removeDupsForTerm(term);
}
} while (theTerms.next());
}
private void removeDupsForTerm(Term term) throws Exception
{
TermDocs td = this.reader.termDocs(term);
for ( int idx = 0; td.next(); ++idx) {
if (idx > 0) {
this.reader.deleteDocument(td.doc());
}
}
}
On 10/2/07, Johnny R. Ruiz III <[EMAIL PROTECTED]> wrote:
>
> Hi Daniel,
>
> Tnx, but forgive my ignorance.. can u give me a sample code to do it
> :). I have never used termDocs() before.
>
> Tnx,
> Johnny
>
> ----- Original Message ----
> From: Daniel Noll <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Tuesday, October 2, 2007 12:00:07 PM
> Subject: Re: Index Dedupe
>
> On Tuesday 02 October 2007 12:25:47 Johnny R. Ruiz III wrote:
> > Hi,
> >
> > I can't seem to find a way to delete duplicate in lucene index. I
> hve a
> > unique key so it seems to be straight forward. But I can't find a
> simple
> > way to do it except for putting each record in the index into HashMap.
> > Are there any method in lucene package that I could use?
>
> I would use termDocs() to iterate all the terms in that field. Then skip
> the
> first doc for each term and delete all subsequent ones.
>
> Daniel
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
>
>
>
>
>
>
>
> ____________________________________________________________________________________
> Need a vacation? Get great deals
> to amazing places on Yahoo! Travel.
> http://travel.yahoo.com/