As predicted, the code I wrote for multiple character substitutions had several bugs. I solved them (see the attachment), but more problems could arise with other languages or other substitutions.
Here I would like to talk about another approach for generating spelling suggestions: just checking the words with substitutions directly. Several steps could be done, but each step is taken only if no suggestions have been found in the previous one. These could be the steps: 1) Make a tree search. 2) Prepare words with substitutions. Are they misspelled words? 3) Make a new tree search of words with substitutions. Note that step 2) is very low cost, and step 3) is high cost. Step 2) could even be the first step. Would this approach be more or less efficient? It depends on the kind and the number of errors we find in the texts. When there is only one or more errors of multiple character substitution, then it will be faster. When there is one error of multiple character substitution plus another kind of error, then it will be slower. So the only way to decide which is better is to try both and see which is better statistically. Note that using multiple character substitution inside the tree search algorithm is not so costly as repeating the tree search, but it is something in between. Best regards, Jaume
Speller.java
Description: Binary data
------------------------------------------------------------------------------ Try New Relic Now & We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, & servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________ Languagetool-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/languagetool-devel
