The Lucene MoreLikeThis tool in lucene/contrib/similar will do one variant of what you want.
You can do this particular test in Solr- you'll find it much much easier to put together. For other text similarities, you'll have to code them directly. Lance On Sat, Nov 13, 2010 at 7:07 AM, Shashi Kant <sk...@sloan.mit.edu> wrote: > There are multiple measures of similarity for documents: Cosine similarity > is a frequently used one. > > > On Sat, Nov 13, 2010 at 9:23 AM, Ciprian URSU <ursu....@gmail.com> wrote: > >> Hi Guys, >> >> I just find out about Lucene; after reading the main things on wiki >> it seems to be a great tool, but I still didn't find out how can I use it >> for my needs. What I want to do is a small tool which has some documents >> (mainly text) inside and then when I have a new document as input, to >> compare it with all those which are stored and to give me back as a >> percentage of similarity. I have read this part: >> http://wiki.apache.org/lucene-java/ScoresAsPercentages but it is not yet >> very clear to me how to use Lucene for that. Is it possible that some of >> you >> have a sample code for that? >> Thanks a lot, and I apologize for the fact that for many of you this >> looks like a stupid post :). >> >> Best Regards, >> Ciprian. >> > -- Lance Norskog goks...@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org