Hi,

I guess I forgot to mention, I made a demo version from the standalone
MyThes thesaurus with stemming and morphological generation half a
year ago. It doesn't handle multiword expressions or general
categories before parenthesis, like the code in the CWS
"hunspell4thesaurus", but it may be useful for dictionary developers:

http://downloads.sourceforge.net/hunspell/MyThes-1.1.tar.gz

See README.NEW and README for compiling.

Test example

Make an input.txt file with two lines, "rodents" and "consumed", and
run MyThes with the
test dictionary:
./example morph.idx morph.dat input.txt morph.aff morph.dic

Thesaurus uses encoding ISO8859-1

stem: rodent
rodent has 1 meanings
   meaning 0: (n) mouse
       mice

stem: consume
consume has 1 meanings
   meaning 0: (v) eat
       eaten, ate
       ingested

The example Hunspell dictionary (meanings of the morphological fields:
po: part of speech category
ts: terminal suffix
al: allomorph
st: stem
is: inflectional suffix, see
http://sourceforge.net/docman/display_doc.php?docid=29374&group_id=143754#Morphological%20analysis):

$ cat morph.dic
8
rodent/S        po:n        ts:nom
mouse   po:n    al:mice ts:nom
mice    po:n st:mouse        is:plur
consume/TQD     po:v ts:present
ingest/TQD      po:v ts:present
eat/QT  po:v    al:ate  al:eaten        ts:present
ate     po:v    st:eat  is:past_1
eaten   po:v    st:eat  is:past_2

$ cat morph.aff
# example for morphological analysis, stemming and generation
SFX D Y 4
SFX D   0 ed [^e] is:past_1
SFX D   0 d e     is:past_1
SFX D   0 ed [^e] is:past_2
SFX D   0 d e     is:past_2

SFX S Y 1
SFX S   0 s . is:plur

SFX Q Y 1
SFX Q   0 s . is:sg_3

SFX T Y 2
SFX T   0 ing [^e] is:pr_part
SFX T   e ing e    is:pr_part

and the thesaurus (without any extra morphological information):

$ cat morph.dat
ISO8859-1
mouse|1
(n)|rodent
rodent|1
(n)|mouse
eat|1
(v)|consume|ingest
consume|1
(v)|eat|ingest
ingest|1
(v)|eat|consume

Regards,
Laci

2008/6/23 Németh László <[EMAIL PROTECTED]>:
> Hi Daniel,
>
> 2008/6/20 Daniel Naber <[EMAIL PROTECTED]>:
>> On Freitag, 20. Juni 2008, Németh László wrote:
>>
>>> "hunspell4thesaurus" contains Hunspell 1.2.4 and a thesaurus patch to
>>> use Hunspell for stemming of the selected words and morphological
>>> generation of the synonyms in OpenOffice.org 3.
>>
>> Hi Laci,
>>
>> thank you, that's great news! Please keep this list up-to-date about when
>> this is available in a new build (because it can be quite difficult to
>> follow the changes in the release notes).
>
> The CWS hunspell4thesaurus (and CWS hyphenator3 with the new compound
> word hyphenation support) are finished and tested on my Linux, but QA
> needs Linux and Windows test builds, too. I have no Windows build
> environment, and it seems, my recent Linux test builds have some
> problems 
> (http://eis.services.openoffice.org/EIS2/cws.ShowCWS?Path=DEV300%2Fhunspell4thesaurus),
> so any help welcome.
> I hope, within a few days I will have a newer Linux build environment
> and I could send a link to a working Linux test build to the list.
> (But the standalone version of Hunspell is suitable for the dictionary
> development.)
>
> Regards,
> Laci
>
>
>
>>
>> Regards
>>  Daniel
>>
>> --
>> http://www.danielnaber.de
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to