There are multiple measures of similarity for documents: Cosine similarity is a frequently used one.
On Sat, Nov 13, 2010 at 9:23 AM, Ciprian URSU <ursu....@gmail.com> wrote: > Hi Guys, > > I just find out about Lucene; after reading the main things on wiki > it seems to be a great tool, but I still didn't find out how can I use it > for my needs. What I want to do is a small tool which has some documents > (mainly text) inside and then when I have a new document as input, to > compare it with all those which are stored and to give me back as a > percentage of similarity. I have read this part: > http://wiki.apache.org/lucene-java/ScoresAsPercentages but it is not yet > very clear to me how to use Lucene for that. Is it possible that some of > you > have a sample code for that? > Thanks a lot, and I apologize for the fact that for many of you this > looks like a stupid post :). > > Best Regards, > Ciprian. >