2014/1/3 Marcin Miłkowski <list-addr...@wp.pl>
> Yes, you are right. I did some profiling and indeed, our main problem is
> that findRepl on line 306 in Speller.java is run zillions of times.
>
> One easy but brutal fix would be to run findRepl only on the original
> word and use replacement pairs only to generate complete candidates.
> This would save a lot of time but on the pain of quality. Maybe it would
> be better before we have a proper traversal routine?
>
> The only way to find out whether the quality of suggestions really drops
> is to run it on common misspellings with and without findRepl on
> wordsToCheck list. Could you make this experiment on Catalan data?
>
>
In that case there will be no suggestions for words with two errors: a
replacement pair + another error. That was the purpose.
I think we can choose a middle ground. It makes no sense to check 6.000
word candidates (none of which exists in the dictionary). But it makes
sense to check three, four or a dozen words. With a limit of 4, all tests I
can imagine in Catalan are passed. So I suggest a number of 10 or 15.
Best,
Jaume
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel