Thanks Otis! Are there Lucene 1.4 compatiable version of these classes?
Thanks very much -Hareesh -----Original Message----- From: Otis Gospodnetic To: java-user@lucene.apache.org Sent: 7/16/2005 3:30 AM Subject: Re: Searching for similar documents We've got this in Lucene's contrib/: $ ll contrib/similarity/src/java/org/apache/lucene/search/similar/*java -rwxrwxr-x 1 otis otis 30431 Jul 9 09:20 MoreLikeThis.java* -rwxrwxr-x 1 otis otis 3612 Mar 16 17:31 SimilarityQueries.java* Otis --- "Kadlabalu, Hareesh" <[EMAIL PROTECTED]> wrote: > Hi, > I am trying to build a search utility that looks for 'similarities' > between > documents. > In other words, for every document listed as a part of search result > for a > phrase, I want to be able to list documents that are similar to it > (but not > necessarily match the same search criterion). For example, if my > search for > "Tomcat" returned "Tomcat installation guide", I want to write a > utility > that looks for all similar installation guides that may or may not be > related to Tomcat. > > One approach I am thinking is to use term vectors. Algorithm: first > extract > the top X term vectors from the current document and create a Boolean > query > for those terms. Run it against contents of other documents (I will > probably > have to remove commonly used terms manually?). Resulting documents > should be > similar to the original one. > > I am wondering if something like this already exists or someone has a > better > algorithm/solution. Or am I headed off in the wrong direction with > this > algorithm? Your advice is highly appreciated. > > Thanks > -Hareesh > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]