Depends what "similar" means. If by similar, you mean they contain alot of the same words/phrases, then you can probably use a query (although the number you can have is limited to 32 or 64 I think) and get documents using lucene.
If by similar you are trying to determine if the text in some documents is byte/byte the same except for some small deviations, you are probably interested in using a nilsima signature. If you have some words/phrases that give you are starting point of documents to check for similiarity, you could use Lucene first, and then nilsima second. If you are talking about conceptual similarity, you probably have a big research project on your hands. -----Original Message----- From: Bruce Ritchie [mailto:[EMAIL PROTECTED] Sent: Thursday, May 29, 2003 1:41 PM To: Lucene Users List Subject: Re: Find Documents 'Similar' to Another Wirthlin, Rick - Workstream wrote: > I have a requirement to find documents similar to another. Can that be accomplished using a PhraseQuery, or some other way? One option I'm looking at to get this functionality is the InfoWrangler product (www.infowrangler.com) as it does this and seems to be at least partially based upon lucene. If other people know of other (available) options I'd love to hear of them as well. Regards, Bruce Ritchie --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
