W dniu 2013-04-07 11:07, Jaume Ortolà i Font pisze: > Hi, > > I have made an improvement in Morfologik speller rule. If few > suggestions are found, then try to get more from the word without > diacritics. This is useful in Catalan, and I guess in other languages. > > The next step would be the other way around: if few suggestions are > found, then try with some replacement patterns (adding diacritics in > some cases...). This patterns have to be language-dependent, of course. > It's OK to write this in MorfologikSpellerRule.java (with the patterns > in the corresponging language rule), Marcin? > > A further step for improving suggestions could be to use a dictionary of > frequencies. With this information the suggestions could be ordered: the > more frequent words first.
Actually, this should be done at the level of the automaton search to make it much faster. We can start hacking around the simplied code of the MorfologikSpeller but it could slow the whole thing drastically (remember that the hunspell slowdown comes from the suggestion part). It's not so hard to add diacritics search, as it was already the part of the fsa_spell, but the easy part about it was that it relied on 8-bit encodings, and with UTF-8, we can no longer believe that every character is just a byte. I don't remember how I traverse the automaton right now, but I believe I started with a simplistic version to add some UTF-8 later on, so maybe it's now easier to implement. The trick is to use a special replacement table when traversing the automaton. This way, it's as speedy as it was before. But coding this, eh, is not so easy. Right now I'm bogged down with another project, and I cannot really sit down and code it... Regards, Marcin ------------------------------------------------------------------------------ Minimize network downtime and maximize team effectiveness. Reduce network management and security costs.Learn how to hire the most talented Cisco Certified professionals. Visit the Employer Resources Portal http://www.cisco.com/web/learning/employer_resources/index.html _______________________________________________ Languagetool-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/languagetool-devel
