On 18:50 Sun 25 Sep , ge wrote: > However, affixing has 2 parts: > 1. create an affix file > 2. add the proper affixes to the individual words in the dictionary file. > > I completely miss 2. in gramadoir.
Yes, I have some of 2. in place for Gramadóir. And this part I admit to being completely undocumented! All that it amounts to at this stage is a simple-minded Perl script that takes as input an affix file and a large plain text corpus of text in the target language (from http://borel.slu.edu/crubadan/). For each flag in the affix file, it applies the rules under that flag *in reverse* (i.e. strips affixes) to all of the words it sees in the corpus and looks for common "root words". For example, imagine there are 6 rules under flag "A", and I find words like "grokker", "grokking", "grokked", "grokkish", "grokalicious" in the corpus such that 5 out of 6 of the flag A rules apply to give the root "grok". Then it might be safe to add "grok/A" to the word list. These candidates can be ranked by percentage if you like; in any case it's usually a good idea to check the output manually. We've had some luck with this approach for Basque, which has rich morphology. It would be nice to generalize this approach to work with HunSpell if anyone's feeling up to the task; I imagine it could start to get computationally expensive for large multilevel affix files and large corpora. I'm guessing I'm not the first to have written something like this - in fact, maybe Laci et al already have something like this in HunSpell; I admit I haven't looked carefully yet. The other important question is automatically constructing the affix file itself from a plain text corpus. This is obviously much harder. Anyone interested in this question should have a look at John Goldsmith's Linguistica project at the Univ. of Chicago. I've played around with the demo and it look promising. http://linguistica.uchicago.edu/ -Kevin --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
