Sounds good. I think the smart compound detection of Hunspell would make it possible to generate a words list.
Hard to get al these words checked and tagged though. I will see. Takes more time than available. Marcin Miłkowski <list-addr...@wp.pl>schreef: >W dniu 2012-05-30 08:51, Ruud Baars pisze: >> detection of runon words is very dangerous when compounding is not >> supported. It will suggest spaces in words where they should not be. >> Only complex compounding support or a list of millions of words cam prevent >> that. > >Of course, I think of *huge* word lists, in terms of tens of millions of >forms. > >Anyway, this kind of speller works for Finnish, whereas hunspell is too >limited (hfst library is used by voikko). Enough said? > >> Ruud Baars<baar...@xs4all.nl>schreef: >> >>> What about compounding and compounding rules? >>> >>> Marcin Miłkowski<list-addr...@wp.pl>schreef: >>> >>>> Hi all, >>>> >>>> if everything goes well, there should be an alternative speller >>>> available for all languages with dictionary-based taggers. I succeeded >>>> in porting fsa_spell to Java -- partially, see here: >>>> >>>> http://morfologik.svn.sourceforge.net/viewvc/morfologik/morfologik-stemming/trunk/morfologik-stemming/src/main/java/morfologik/stemming/Speller.java?revision=374&view=markup >>>> >>>> I don't know if it will be faster than hunspell but fsa_spell definitely >>>> is. I will make comparisons as soon as it is possible (right now I am >>>> not sure if the code is really sane). Now, the suggestion algorithm used >>>> by fsa_spell is somewhat less capable than in hunspell (for details of >>>> the core algorithm, you can see Oflazer's classical paper, >>>> http://acl.ldc.upenn.edu/J/J96/J96-1003.pdf), so it was somewhat >>>> complemented by special routines for detection of run-on words (like >>>> thisis instead of "this is") and for restoration of diacritics. The >>>> latter code is obscure and I will implement in a different way, also for >>>> the REP and MAP features of hunspell. >>>> >>>> Now, the finite-state speller does not have to rely on the tagger >>>> dictionary at all; there is a conversion code used for another project >>>> that can turn the hunspell dictionary into a finite-state machine >>>> (http://hfst.svn.sourceforge.net/viewvc/hfst/trunk/conversion-scripts/ ) >>>> - but I haven't tried it yet. There is also a shell script that takes >>>> several hours to process (on the German dictionary) but spits out a >>>> complete wordlist out of the hunspell dictionary. In a nutshell, >>>> finite-state speller should be able to recognize all the words contained >>>> in the hunspell dictionary, and it would probably suggest most of the >>>> corrections that hunspell displays as well. >>>> >>>> All that does not mean I want to remove hunspell code from LanguageTool; >>>> on the contrary; I just think some languages will not need any advanced >>>> features of it at all, and we should simply have faster algorithms on >>>> offer. >>>> >>>> Regards, >>>> Marcin >>>> >>>> ------------------------------------------------------------------------------ >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. Discussions >>>> will include endpoint security, mobile security and the latest in malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> Languagetool-devel mailing list >>>> Languagetool-devel@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Languagetool-devel mailing list >>> Languagetool-devel@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Languagetool-devel mailing list >> Languagetool-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/languagetool-devel > > >------------------------------------------------------------------------------ >Live Security Virtual Conference >Exclusive live event will cover all the ways today's security and >threat landscape has changed and how IT managers can respond. Discussions >will include endpoint security, mobile security and the latest in malware >threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >_______________________________________________ >Languagetool-devel mailing list >Languagetool-devel@lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/languagetool-devel ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel