Perhaps "least frequent substring" or even "suffix truncation" might be enough for your needs.
Here is a related paper: http://web.jhu.edu/bin/q/b/p75-mcnamee.pdf karl On Jun 8, 2011, at 1:52 PM, Mohamed Yahya wrote: > You're right. Still, I am not sure if there is a library that would > take care of examples such as the one I gave. > > On Wed, Jun 8, 2011 at 11:25, Lahiru Samarakoon <[email protected]> wrote: >> Hi, >> >>> >>> Is there something in Lucene that supports lemmatization of the following >>> form: >>> >>> Mexican --> Mexico (from adjective to name/noune) >>> >>> Lemmatization do not change part of speech. I think you are looking for a >> stemming algorithm. >> >> http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html >> >>> >>> >>> Thanks, >> Lahiru >> > > > > -- > Mohamed Yahya > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
