Hi, The logic you are looking for is Lemmatization - http://en.wikipedia.org/wiki/Lemmatisation. I don't think Lucene has a built-in lemmatizer but you can use GATE which is an open source project: http://gate.ac.uk http://gate.ac.uk/gate/doc/plugins.html
Enjoy! -----Original Message----- From: Kasun Perera [mailto:kas...@opensource.lk] Sent: Saturday, April 28, 2012 6:03 AM To: java-user@lucene.apache.org Subject: Indexing with Semantics I'm using Lucene's Term Freq vector to calculate cosine similarity between documents, Say my docments has these 3 terms, "owe" "owed" "owing". Lucene takes this as 3 separate terms, but 3 of them means same "owe". Is there any functionality in Lucene that can be used to index by semantics? so that it indexes "owe" "owed" "owing" as one word "owe" with term frequency =3 ? If not I'd welcome any suggestions achieving this task? -- Regards Kasun Perera --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org