Hi, I guess I forgot to mention, I made a demo version from the standalone MyThes thesaurus with stemming and morphological generation half a year ago. It doesn't handle multiword expressions or general categories before parenthesis, like the code in the CWS "hunspell4thesaurus", but it may be useful for dictionary developers:
http://downloads.sourceforge.net/hunspell/MyThes-1.1.tar.gz See README.NEW and README for compiling. Test example Make an input.txt file with two lines, "rodents" and "consumed", and run MyThes with the test dictionary: ./example morph.idx morph.dat input.txt morph.aff morph.dic Thesaurus uses encoding ISO8859-1 stem: rodent rodent has 1 meanings meaning 0: (n) mouse mice stem: consume consume has 1 meanings meaning 0: (v) eat eaten, ate ingested The example Hunspell dictionary (meanings of the morphological fields: po: part of speech category ts: terminal suffix al: allomorph st: stem is: inflectional suffix, see http://sourceforge.net/docman/display_doc.php?docid=29374&group_id=143754#Morphological%20analysis): $ cat morph.dic 8 rodent/S po:n ts:nom mouse po:n al:mice ts:nom mice po:n st:mouse is:plur consume/TQD po:v ts:present ingest/TQD po:v ts:present eat/QT po:v al:ate al:eaten ts:present ate po:v st:eat is:past_1 eaten po:v st:eat is:past_2 $ cat morph.aff # example for morphological analysis, stemming and generation SFX D Y 4 SFX D 0 ed [^e] is:past_1 SFX D 0 d e is:past_1 SFX D 0 ed [^e] is:past_2 SFX D 0 d e is:past_2 SFX S Y 1 SFX S 0 s . is:plur SFX Q Y 1 SFX Q 0 s . is:sg_3 SFX T Y 2 SFX T 0 ing [^e] is:pr_part SFX T e ing e is:pr_part and the thesaurus (without any extra morphological information): $ cat morph.dat ISO8859-1 mouse|1 (n)|rodent rodent|1 (n)|mouse eat|1 (v)|consume|ingest consume|1 (v)|eat|ingest ingest|1 (v)|eat|consume Regards, Laci 2008/6/23 Németh László <[EMAIL PROTECTED]>: > Hi Daniel, > > 2008/6/20 Daniel Naber <[EMAIL PROTECTED]>: >> On Freitag, 20. Juni 2008, Németh László wrote: >> >>> "hunspell4thesaurus" contains Hunspell 1.2.4 and a thesaurus patch to >>> use Hunspell for stemming of the selected words and morphological >>> generation of the synonyms in OpenOffice.org 3. >> >> Hi Laci, >> >> thank you, that's great news! Please keep this list up-to-date about when >> this is available in a new build (because it can be quite difficult to >> follow the changes in the release notes). > > The CWS hunspell4thesaurus (and CWS hyphenator3 with the new compound > word hyphenation support) are finished and tested on my Linux, but QA > needs Linux and Windows test builds, too. I have no Windows build > environment, and it seems, my recent Linux test builds have some > problems > (http://eis.services.openoffice.org/EIS2/cws.ShowCWS?Path=DEV300%2Fhunspell4thesaurus), > so any help welcome. > I hope, within a few days I will have a newer Linux build environment > and I could send a link to a working Linux test build to the list. > (But the standalone version of Hunspell is suitable for the dictionary > development.) > > Regards, > Laci > > > >> >> Regards >> Daniel >> >> -- >> http://www.danielnaber.de >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
