Hi,
The logic you are looking for is Lemmatization - 
http://en.wikipedia.org/wiki/Lemmatisation.
I don't think Lucene has a built-in lemmatizer but you can use GATE which is an 
open source project:
http://gate.ac.uk
http://gate.ac.uk/gate/doc/plugins.html

Enjoy!



-----Original Message-----
From: Kasun Perera [mailto:kas...@opensource.lk] 
Sent: Saturday, April 28, 2012 6:03 AM
To: java-user@lucene.apache.org
Subject: Indexing with Semantics

I'm using Lucene's Term Freq vector to calculate cosine similarity between 
documents, Say my docments has these 3 terms, "owe" "owed" "owing". Lucene 
takes this as 3 separate terms, but 3 of them means same "owe". Is there any 
functionality in Lucene that can be used to index by semantics? so that it 
indexes "owe" "owed" "owing" as one word "owe" with term frequency =3 ?

If not I'd welcome any suggestions achieving this task?

--
Regards

Kasun Perera

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to