There is a relatively fast way to deal with alternatives.

Fore every word one could compute a number, using this recipe:
- lowercase the character
- unaccent the character
- have the ascii-value of the letter, raise it to the fifth power
- add all these numbers

This makes a 'fast lookup'-number for every word. All word orders are dropped this way. It is also easy to apply the most common 'replacements' to a number like this, since every replacement is a simple add or subtract.

When a unknown word is found, have its number(s), find these words, order them according to frequency, drop the ones with too big a levenshtein distance.

It is not necessary to do this computing all the time. For all knwon words a plain list of known error to alternatives is a one-time computing job.

Ruud

On 07-04-13 11:07, Jaume Ortolà i Font wrote:
Hi,

I have made an improvement in Morfologik speller rule. If few suggestions are found, then try to get more from the word without diacritics. This is useful in Catalan, and I guess in other languages.

The next step would be the other way around: if few suggestions are found, then try with some replacement patterns (adding diacritics in some cases...). This patterns have to be language-dependent, of course. It's OK to write this in MorfologikSpellerRule.java (with the patterns in the corresponging language rule), Marcin?

A further step for improving suggestions could be to use a dictionary of frequencies. With this information the suggestions could be ordered: the more frequent words first.

Regards,
Jaume Ortolà


------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html


_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to