Hi all,

if everything goes well, there should be an alternative speller 
available for all languages with dictionary-based taggers. I succeeded 
in porting fsa_spell to Java -- partially, see here:

http://morfologik.svn.sourceforge.net/viewvc/morfologik/morfologik-stemming/trunk/morfologik-stemming/src/main/java/morfologik/stemming/Speller.java?revision=374&view=markup

I don't know if it will be faster than hunspell but fsa_spell definitely 
is. I will make comparisons as soon as it is possible (right now I am 
not sure if the code is really sane). Now, the suggestion algorithm used 
by fsa_spell is somewhat less capable than in hunspell (for details of 
the core algorithm, you can see Oflazer's classical paper, 
http://acl.ldc.upenn.edu/J/J96/J96-1003.pdf), so it was somewhat 
complemented by special routines for detection of run-on words (like 
thisis instead of "this is") and for restoration of diacritics. The 
latter code is obscure and I will implement in a different way, also for 
the REP and MAP features of hunspell.

Now, the finite-state speller does not have to rely on the tagger 
dictionary at all; there is a conversion code used for another project 
that can turn the hunspell dictionary into a finite-state machine 
(http://hfst.svn.sourceforge.net/viewvc/hfst/trunk/conversion-scripts/ ) 
- but I haven't tried it yet. There is also a shell script that takes 
several hours to process (on the German dictionary) but spits out a 
complete wordlist out of the hunspell dictionary. In a nutshell, 
finite-state speller should be able to recognize all the words contained 
in the hunspell dictionary, and it would probably suggest most of the 
corrections that hunspell displays as well.

All that does not mean I want to remove hunspell code from LanguageTool; 
on the contrary; I just think some languages will not need any advanced 
features of it at all, and we should simply have faster algorithms on offer.

Regards,
Marcin

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to