Kevin,

> > However, affixing has 2 parts:
> > 1. create an affix file
> > 2. add the proper affixes to the individual words in the dictionary
> file.
> > 
> > I completely miss 2. in gramadoir.
> 
> Yes, I have some of 2. in place for Gramadóir.   And
> this part I admit to being completely undocumented!
> 
> All that it amounts to at this stage is a simple-minded
> Perl script that takes as input an affix file
> and a large plain text corpus of text in the
> target language (from http://borel.slu.edu/crubadan/).
> For each flag in the affix file, it applies
> the rules under that flag *in reverse*
> (i.e. strips affixes) to all of
> the words it sees in the corpus and looks for
> common "root words".

Can it only strip? The condition and 
modification (addition) are important.

For example
alma
is almák in plural,
therefore the rule: trip ák and add a to get the original word.
 
> For example, imagine there are 6 rules under flag "A",
> and I find words like "grokker", "grokking", "grokked",
> "grokkish", "grokalicious" in the corpus
> such that 5 out of 6 of the flag A rules apply
> to give the root "grok".  Then it might be safe to add
> "grok/A" to the word list.  

Why would it be safe if you could not find the sixth
rule in the corpus?

> These candidates can be ranked by percentage if you like;
> in any case it's usually a good idea to check the output
> manually.  We've had some luck with this approach for Basque,
> which has rich morphology.
> 
> It would be nice to generalize this approach
> to work with HunSpell if anyone's feeling
> up to the task; I imagine it could
> start to get computationally expensive
> for large multilevel affix files and large corpora.
> 
> I'm guessing I'm not the first to have written
> something like this - in fact, maybe Laci et al
> already have something like this in HunSpell;
> I admit I haven't looked carefully yet.
> 
> The other important question is automatically
> constructing the affix file itself from a plain text
> corpus.  This is obviously much harder.   
> Anyone interested in this question should have
> a look at John Goldsmith's Linguistica
> project at the Univ. of Chicago.  I've played
> around with the demo and it look promising.
> http://linguistica.uchicago.edu/

example 3-1 in http://borel.slu.edu/gramadoir/manual/c409.html#POS says:

dipper 31
dire 36
direct 33
direct 36
direct 37
directed 36
direction 31
directional 36
directions 32

What do 31, 32, etc mean?
Are they groups of flags in myspell?
Where are these flags documented?
Where is their connection with the affixes documented?

I also cannot see the kind of word there (verb, noun, etc...)

Finally: How are the grammatical errors formulated
and entered into gramadoir.

For example the error in Hungarian:
I see two boys
if we write boys it is an error, because 
if the number is there, the noun must be 
singular.

In human language: after a verb if there is a number, or words
that express quantites( many, several, some), the subsequential noun
must be singular. 

How to formulate this in gramadoir?

Thanks, Eleonora

-- 
GMX DSL = Maximale Leistung zum minimalen Preis!
2000 MB nur 2,99, Flatrate ab 4,99 Euro/Monat: http://www.gmx.net/de/go/dsl

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to