Detection of index dublicates in Lucene

Dmitry Sat, 28 Jul 2007 23:17:37 -0700

We trying to find are any implementation for Lucene - detection indexduclicates.Assuming we have a set of documents and a document is a bunch of words.After we created indexec for the same document we need to knwo that allideces will be uniq for specific document. (lexical equivalency).

Can we have like implementation of algorithm has not determined a duplicateand another situation when algorithm has offered a false duplicate. In thiscase we can find all dublicate indeces.

And the same Algorithm we can use to detect Document dublicates - in thiscase we save time and can get better performance not to run indexed servicesagainst this document.


Please any suggestions will be good.

Thanks,

DT,

www.ejinz.com

Search Engine News




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Detection of index dublicates in Lucene

Reply via email to