W dniu 2012-05-30 06:52, Ruud Baars pisze: > What about compounding and compounding rules?
They are useful to create a wordlist, but the wordlist is anyway finite, so it can be represented by the finite-state dictionary (actually, it could also represent an infinite dictionary, but we have no simple way to create such a binary automaton, i.e. it would have to be written). If it turns out that hunspell is better, there is no reason to settle for it. But for many languages it's overkill. M. > Marcin Miłkowski<list-addr...@wp.pl>schreef: > >> Hi all, >> >> if everything goes well, there should be an alternative speller >> available for all languages with dictionary-based taggers. I succeeded >> in porting fsa_spell to Java -- partially, see here: >> >> http://morfologik.svn.sourceforge.net/viewvc/morfologik/morfologik-stemming/trunk/morfologik-stemming/src/main/java/morfologik/stemming/Speller.java?revision=374&view=markup >> >> I don't know if it will be faster than hunspell but fsa_spell definitely >> is. I will make comparisons as soon as it is possible (right now I am >> not sure if the code is really sane). Now, the suggestion algorithm used >> by fsa_spell is somewhat less capable than in hunspell (for details of >> the core algorithm, you can see Oflazer's classical paper, >> http://acl.ldc.upenn.edu/J/J96/J96-1003.pdf), so it was somewhat >> complemented by special routines for detection of run-on words (like >> thisis instead of "this is") and for restoration of diacritics. The >> latter code is obscure and I will implement in a different way, also for >> the REP and MAP features of hunspell. >> >> Now, the finite-state speller does not have to rely on the tagger >> dictionary at all; there is a conversion code used for another project >> that can turn the hunspell dictionary into a finite-state machine >> (http://hfst.svn.sourceforge.net/viewvc/hfst/trunk/conversion-scripts/ ) >> - but I haven't tried it yet. There is also a shell script that takes >> several hours to process (on the German dictionary) but spits out a >> complete wordlist out of the hunspell dictionary. In a nutshell, >> finite-state speller should be able to recognize all the words contained >> in the hunspell dictionary, and it would probably suggest most of the >> corrections that hunspell displays as well. >> >> All that does not mean I want to remove hunspell code from LanguageTool; >> on the contrary; I just think some languages will not need any advanced >> features of it at all, and we should simply have faster algorithms on offer. >> >> Regards, >> Marcin >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Languagetool-devel mailing list >> Languagetool-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/languagetool-devel > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel