On Tuesday 26 July 2005 22:14, Valliappan Annamalai wrote: > I would like to work on "Design and build code that will combine > information on prefixes and suffixes".
The thesaurus would benefit from code that can find the base form for any word. E.g. walked -> walk, children -> child. This could be plugged into the existing thesaurus code easily, it's basically just one method like "getBaseform(String)". Of course it would need to support several languages. Some languages are very irregular, this also needs to be handled efficiently. BTW, an easier thing to start with would be to check how the thesaurus code can be modified so it supports UTF-8. A standalone version of the thesaurus code is available at http://lingucomponent.openoffice.org/thesaurus.html Regards Daniel -- http://www.danielnaber.de --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
