Hi,
The logic you are looking for is Lemmatization - 
http://en.wikipedia.org/wiki/Lemmatisation.
I don't think Lucene has a built-in lemmatizer but you can use GATE which is an 
open source project:
http://gate.ac.uk
http://gate.ac.uk/gate/doc/plugins.html

Enjoy!



-----Original Message-----
From: Kasun Perera [mailto:[email protected]] 
Sent: Saturday, April 28, 2012 6:03 AM
To: [email protected]
Subject: Indexing with Semantics

I'm using Lucene's Term Freq vector to calculate cosine similarity between 
documents, Say my docments has these 3 terms, "owe" "owed" "owing". Lucene 
takes this as 3 separate terms, but 3 of them means same "owe". Is there any 
functionality in Lucene that can be used to index by semantics? so that it 
indexes "owe" "owed" "owing" as one word "owe" with term frequency =3 ?

If not I'd welcome any suggestions achieving this task?

--
Regards

Kasun Perera

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to