Hi, The logic you are looking for is Lemmatization - http://en.wikipedia.org/wiki/Lemmatisation. I don't think Lucene has a built-in lemmatizer but you can use GATE which is an open source project: http://gate.ac.uk http://gate.ac.uk/gate/doc/plugins.html
Enjoy! -----Original Message----- From: Kasun Perera [mailto:[email protected]] Sent: Saturday, April 28, 2012 6:03 AM To: [email protected] Subject: Indexing with Semantics I'm using Lucene's Term Freq vector to calculate cosine similarity between documents, Say my docments has these 3 terms, "owe" "owed" "owing". Lucene takes this as 3 separate terms, but 3 of them means same "owe". Is there any functionality in Lucene that can be used to index by semantics? so that it indexes "owe" "owed" "owing" as one word "owe" with term frequency =3 ? If not I'd welcome any suggestions achieving this task? -- Regards Kasun Perera --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
