Hi,
You are right, your affix table needs some optimization. The second example
is much better, because analyzing multilevel suffixes needs more time and
zero suffixes have the biggest overhead. Word analysis checks *all* zero
affix rules for every input words and suggestion candidates. There are too
many zero affixes in your affix table:
$ cat fr.aff | LC_ALL=C awk 'BEGIN{FS="[ \t/]*"}/^[SP]FX/ && NF>4{print$4}'
| sort | uniq -c | sort -nrk 1 | head
386 s
* 385 0*
321 ais
222 ons
209 ions
209 iez
207 ez
157 ait
151 is
148 aient
Also it would be better to decrease the redundant suffixes (with different
stripping characters) of irregular words by (1) pseudoroots with NEEDAFFIX
flags or (2) generating from common suffixed forms or simply by (3) new
dictionary items. Examples:
(1) flag x defines -s, -d suffixes, "ha" is a pseudoroot (flag ! is the
NEEDAFFIX flag)
shake/x -> shake, shakes, shaked
ha/x! -> has, had
have -> have
(2) flag y defines a "d" stripping character and an "s" suffix character:
shaked/y -> shaked, shakes
shake -> shake
had/y -> had, has
have -> have
(3)
shake/x
had
has
have
Hungarian dictionary uses (2) for irregular nouns and (3) for a lot of
irregular verbs.
Regards,
László
2008/8/20 Olivier R. <[EMAIL PROTECTED]>
> What's the best to define affix rules ?
>
>
> For example:
>
> Flag S. defines how to make plural forms in French:
> One rule.
>
> SFX S. Y 1
> SFX S. 0 s [^sxz] /pl
>
>
> Others flags often call S. to generate their plural flexions, ie:
>
> SFX F. N 36
> SFX F. 0 0 . .fem
> SFX F. 0 s [eë] .fem/pl
> SFX F. e 0/S. [éiï]e .mas
> SFX F. rice eur/S. [dt]rice .mas
> SFX F. e 0/S. de .mas
> SFX F. fe 0/S. ffe .mas
> SFX F. he 0/S. [^è]che .mas
> SFX F. èche ec/S. èche .mas
> SFX F. e 0/S. [ut]he .mas
> SFX F. e 0/S. ke .mas
> SFX F. e 0/S. ale .mas
> SFX F. e 0/S. [iouû]le .mas
> SFX F. le 0/S. [eiu]lle .mas
> SFX F. e 0/S. [aiou]ne .mas
> SFX F. ne 0/S. [aeo]nne .mas
> SFX F. gne n/S. igne .mas
> SFX F. e 0/S. [aiuûy]re .mas
> SFX F. ère er/S. ère .mas
> SFX F. e 0 [^us]se .mas.inv
> SFX F. sse 0/S. [^eo].esse .mas
> SFX F. resse ur/S. eresse .mas
> SFX F. oresse eur/S. oresse .mas
> SFX F. se 0 [^e]sse .mas.inv
> SFX F. e 0 [^eo]use .mas.inv
> SFX F. se r/S. euse .mas
> SFX F. e 0/S. [^èt]te .mas
> SFX F. te 0/S. tte .mas
> SFX F. ète et/S. ète .mas
> SFX F. e 0/S. [^gq]ue .mas
> SFX F. ue 0/S. gue .mas
> SFX F. que 0/S. cque .mas
> SFX F. que c/S. [^c]que .mas
> SFX F. ève ef/S. ève .mas
> SFX F. ve f/S. [iïu]ve .mas
> SFX F. ë 0/S. uë .mas
> SFX F. üe u/S. üe .mas
>
>
> But I could write F. differently, like :
>
> SFX F' Y 68
> SFX F' 0 0 . .fem
> SFX F' 0 s [eë] .fem/pl
> SFX F' e 0 [éiï]e .mas
> SFX F' e s [éiï]e .mas/pl
> SFX F' rice eur [dt]rice .mas
> SFX F' rice eurs [dt]rice .mas/pl
> SFX F' e 0 de .mas
> SFX F' e s de .mas/pl
> SFX F' fe 0 ffe .mas
> SFX F' fe s ffe .mas/pl
> SFX F' he 0 [^è]che .mas
> SFX F' he s [^è]che .mas/pl
> SFX F' èche ec èche .mas
> SFX F' èche ecs èche .mas/pl
> SFX F' e 0 [ut]he .mas
> SFX F' e s [ut]he .mas/pl
> SFX F' e 0 ke .mas
> SFX F' e s ke .mas/pl
> SFX F' e 0 ale .mas
> SFX F' e s ale .mas/pl
> SFX F' e 0 [iouû]le .mas
> SFX F' e s [iouû]le .mas/pl
> SFX F' le 0 [eiu]lle .mas
> SFX F' le s [eiu]lle .mas/pl
> SFX F' e 0 [aiou]ne .mas
> SFX F' e s [aiou]ne .mas/pl
> SFX F' ne 0 [aeo]nne .mas
> SFX F' ne s [aeo]nne .mas/pl
> SFX F' gne n igne .mas
> SFX F' gne ns igne .mas/pl
> SFX F' e 0 [aiuûy]re .mas
> SFX F' e s [aiuûy]re .mas/pl
> SFX F' ère er ère .mas
> SFX F' ère ers ère .mas/pl
> SFX F' e 0 [^us]se .mas.inv
> SFX F' sse 0 [^eo].esse .mas
> SFX F' sse s [^eo].esse .mas/pl
> SFX F' resse ur eresse .mas
> SFX F' resse urs eresse .mas/pl
> SFX F' oresse eur oresse .mas
> SFX F' oresse eurs oresse .mas/pl
> SFX F' se 0 [^e]sse .mas.inv
> SFX F' e 0 [^eo]use .mas.inv
> SFX F' se r euse .mas
> SFX F' se rs euse .mas/pl
> SFX F' e 0 [^èt]te .mas
> SFX F' e s [^èt]te .mas/pl
> SFX F' te 0 tte .mas
> SFX F' te s tte .mas/pl
> SFX F' ète et ète .mas
> SFX F' ète ets ète .mas/pl
> SFX F' ète et ète .mas
> SFX F' ète ets ète .mas/pl
> SFX F' e 0 [^gq]ue .mas
> SFX F' e s [^gq]ue .mas/pl
> SFX F' ue 0 gue .mas
> SFX F' ue s gue .mas/pl
> SFX F' que 0 cque .mas
> SFX F' que s cque .mas/pl
> SFX F' que c [^c]que .mas
> SFX F' que cs [^c]que .mas/pl
> SFX F' ève ef ève .mas
> SFX F' ève efs ève .mas/pl
> SFX F' ve f [iïu]ve .mas
> SFX F' ve fs [iïu]ve .mas/pl
> SFX F' ë 0 uë .mas
> SFX F' ë s uë .mas/pl
> SFX F' üe u üe .mas
> SFX F' üe us üe .mas/pl
>
> F' does the same thing than F.
>
> Which one of the both is the best ?
>
>
> Regards,
> Olivier
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>