As predicted, the code I wrote for multiple character substitutions had
several bugs. I solved them (see the attachment), but more problems could
arise with other languages or other substitutions.

Here I would like to talk about another approach for generating spelling
suggestions: just checking the words with substitutions directly. Several
steps could be done, but each step is taken only if no suggestions have
been found in the previous one. These could be the steps:

1) Make a tree search.
2) Prepare words with substitutions. Are they misspelled words?
3) Make a new tree search of words with substitutions.

Note that step 2) is very low cost, and step 3) is high cost. Step 2) could
even be the first step.

Would this approach be more or less efficient? It depends on the kind and
the number of errors we find in the texts. When there is only one or more
errors of multiple character substitution, then it will be faster. When
there is one error of multiple character substitution plus another kind of
error, then it will be slower. So the only way to decide which is better is
to try both and see which is better statistically.

Note that using multiple character substitution inside the tree search
algorithm is not so costly as repeating the tree search, but it is
something in between.

Best regards,
Jaume

Attachment: Speller.java
Description: Binary data

------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to