W dniu 2013-04-07 11:07, Jaume Ortolà i Font pisze:
> Hi,
>
> I have made an improvement in Morfologik speller rule. If few
> suggestions are found, then try to get more from the word without
> diacritics. This is useful in Catalan, and I guess in other languages.
>
> The next step would be the other way around: if few suggestions are
> found, then try with some replacement patterns (adding diacritics in
> some cases...). This patterns have to be language-dependent, of course.
> It's OK to write this in MorfologikSpellerRule.java (with the patterns
> in the corresponging language rule), Marcin?
>
> A further step for improving suggestions could be to use a dictionary of
> frequencies. With this information the suggestions could be ordered: the
> more frequent words first.

Actually, this should be done at the level of the automaton search to 
make it much faster. We can start hacking around the simplied code of 
the MorfologikSpeller but it could slow the whole thing drastically 
(remember that the hunspell slowdown comes from the suggestion part). 
It's not so hard to add diacritics search, as it was already the part of 
the fsa_spell, but the easy part about it was that it relied on 8-bit 
encodings, and with UTF-8, we can no longer believe that every character 
is just a byte. I don't remember how I traverse the automaton right now, 
but I believe I started with a simplistic version to add some UTF-8 
later on, so maybe it's now easier to implement.

The trick is to use a special replacement table when traversing the 
automaton. This way, it's as speedy as it was before.

But coding this, eh, is not so easy. Right now I'm bogged down with 
another project, and I cannot really sit down and code it...

Regards,
Marcin

------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to