2014/1/2 Marcin Miłkowski <list-addr...@wp.pl>

> I did look at them again and it turns out that most only slowed down the
> process but did not help in creating good suggestions as short fragments
> were not put together the way they should. So I went ahead and removed a
> fair number of the replacement pairs but we still need to find ways to
> limit search if we have too many replacement pairs.
>

That's easy, but the search will be not exhaustive...


> I don't think traversing the automaton directly would be faster because
> we stop searching the automaton early anyway (as soon as the letter is
> not found).


I don't think so. Precisely when the algorithm is looking at the first
letters ("Mi.. ") there are many words to search in the dictionary (with
distance=1: ma..., me..., mi...; ai..., bi..., ci..., ... zi...). As you
progress ("Milkow..."), there are less and less words. So every time we
start again the search with a new candidate I think we lose a lot of time.
Traversing the automaton should be quite faster.

The difficulty in traversing the automaton appears when replacing pairs
with different lengths (one to two, one to three, two to three...). The
management of the matrix where the distances are stored becomes tricky.
When I have some time, I could try again...


> I'm afraid the truth is that we have to be very careful when
> adding replacement pairs, as completely ignoring one-letter replacements
> is not a good idea (at least not for Polish, in which people confuse "h"
> and "ch").
>

 If the replacement h->ch can happen in any context or in most contexts,
then the one-letter replacement is unavoidable.

Regards,
Jaume
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to