Re: [lingu-dev] Re: [dev] Contributions

Marcin Miłkowski Wed, 27 Jul 2005 14:11:11 -0700

Daniel Naber wrote:

The thesaurus would benefit from code that can find the base form for anyword. E.g. walked -> walk, children -> child. This could be plugged intothe existing thesaurus code easily, it's basically just one method like"getBaseform(String)". Of course it would need to support severallanguages. Some languages are very irregular, this also needs to behandled efficiently.

This is already done, Daniel. Look at hunmorph in hunspell package -this program is just a stemmer using myspell/hunspell-formatteddictionaries.

The task is to integrate the thesaurus code with an appropriate calls tofunctions hunmorph uses.

BTW, an easier thing to start with would be to check how the thesaurus codecan be modified so it supports UTF-8. A standalone version of thethesaurus code is available athttp://lingucomponent.openoffice.org/thesaurus.html

Yeah, that should be done - especially because hunspell supportsmultibyte characters.


Regards,
Marcin

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [lingu-dev] Re: [dev] Contributions

Reply via email to