On 18:50 Sun 25 Sep     , ge wrote:
> However, affixing has 2 parts:
> 1. create an affix file
> 2. add the proper affixes to the individual words in the dictionary file.
> 
> I completely miss 2. in gramadoir.

Yes, I have some of 2. in place for Gramadóir.   And
this part I admit to being completely undocumented!

All that it amounts to at this stage is a simple-minded
Perl script that takes as input an affix file
and a large plain text corpus of text in the
target language (from http://borel.slu.edu/crubadan/).
For each flag in the affix file, it applies
the rules under that flag *in reverse*
(i.e. strips affixes) to all of
the words it sees in the corpus and looks for
common "root words".

For example, imagine there are 6 rules under flag "A",
and I find words like "grokker", "grokking", "grokked",
"grokkish", "grokalicious" in the corpus
such that 5 out of 6 of the flag A rules apply
to give the root "grok".  Then it might be safe to add
"grok/A" to the word list.  

These candidates can be ranked by percentage if you like;
in any case it's usually a good idea to check the output
manually.  We've had some luck with this approach for Basque,
which has rich morphology.

It would be nice to generalize this approach
to work with HunSpell if anyone's feeling
up to the task; I imagine it could
start to get computationally expensive
for large multilevel affix files and large corpora.

I'm guessing I'm not the first to have written
something like this - in fact, maybe Laci et al
already have something like this in HunSpell;
I admit I haven't looked carefully yet.

The other important question is automatically
constructing the affix file itself from a plain text
corpus.  This is obviously much harder.   
Anyone interested in this question should have
a look at John Goldsmith's Linguistica
project at the Univ. of Chicago.  I've played
around with the demo and it look promising.
http://linguistica.uchicago.edu/

-Kevin


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to