Re: [lingu-dev] Hunspell - slow to make suggestions

Németh László Thu, 21 Aug 2008 01:58:29 -0700

Hi Olivier,

2008/8/20 Olivier R. <[EMAIL PROTECTED]>


> Hi László,
>
> Thank you for the explanations.


I'm glad of the French dictionary improvements and your questions.


>
>
>
>  zero suffixes have the biggest overhead. Word analysis checks *all* zero
>> affix rules for every input words and suggestion candidates. There are too
>> many zero affixes in your affix table:
>>
>
> So, if I had understood properly, it is quicker for Hunspell to strip and
> add, than to strip and add nothing.
>
> For instance:
>     SFX F.   nne        n                   [aeo]nne            .mas
> is better than
>     SFX F.   ne         0                   [aeo]nne            .mas
>
> Right?


Absolutely.

>
>
> That is not intuitive. I suggest that this should be explained in the
> documentation, to prevent others to make bloated stuff as I did. :)


I will add a note to the manual. Thanks for the tip.


>
>
>
>  Also it would be better to decrease the redundant suffixes (with different
>> stripping characters) of irregular words by (1) pseudoroots with NEEDAFFIX
>> flags or (2) generating from common suffixed forms or simply by (3) new
>> dictionary items.
>>
>
> OK. I'll try this, if removing 0 affixes is not sufficunt.
>
> Actually, I already use a lot the NEEDAFFIX flag, but in a different way.
> On the 60,000 entries, ~28,000 are tagged with it (among them, all the 7,000
> verbs).
>
>
>
> The 0 affixes issue makes me wondering how things work with the conditional
> field... :)
>
> For example, if I have several verbs which end by 'cevoir'. I have a flag
> for these verbs and only for them.
>
> Which way is it better to write the conditional field?
> With
> - a long field, ie:      cevoir
> - a short field, ie:     ir
> - no condition, ie:      .
> ?
>

Using (long) conditions are better, because they can save dictionary
lookups.


>
>
> If possible, is full word condition advised or not advised ?


I believe, using dictionary items instead of affix rules with full word
conditions is better, but this makes little difference in spell checking of
normal texts. There is nothing problem with full word conditions, but
Hunspell 1.1.12 in the recent OpenOffice.org uses only the first 8
characters of the conditions. Hunspell 1.2.x has a new condition checking
algorithm without this limit. Hunspell 1.2.7 has an optional FULLSTRIP mode
for word-length strippings, too: see
http://www.openoffice.org/issues/show_bug.cgi?id=80145.

Best regards,

László


>
>
>
> Thank you for helping.
>
> Best regards,
> Olivier
>
> --
>
> == N'écrivez pas à cette adresse. Dédiée aux listes de discussion. ==
> ** Do not write at this address. Mailing-list only. **
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Re: [lingu-dev] Hunspell - slow to make suggestions

Reply via email to