Marcin, where would you create the Dutch spelling dictionary from? Exploding the Dutch spelling will create a monster and wrong Dutch words list.
I could give you one, but that one is either small and open content, or big and not to be made public (yet). Ruud Op 30-05-12 20:55, Marcin Miłkowski schreef: > W dniu 2012-05-30 11:38, Ruud Baars pisze: >> Sounds good. I think the smart compound detection of Hunspell would make it >> possible to generate a words list. >> >> Hard to get al these words checked and tagged though. >> I will see. Takes more time than available. > I will take care of the process of dictionary creation myself, and the > language maintainers will only see the results (you will be able to > compare different spelling backends). > > Regards > Marcin > >> Marcin Miłkowski<list-addr...@wp.pl>schreef: >> >>> W dniu 2012-05-30 08:51, Ruud Baars pisze: >>>> detection of runon words is very dangerous when compounding is not >>>> supported. It will suggest spaces in words where they should not be. >>>> Only complex compounding support or a list of millions of words cam >>>> prevent that. >>> Of course, I think of *huge* word lists, in terms of tens of millions of >>> forms. >>> >>> Anyway, this kind of speller works for Finnish, whereas hunspell is too >>> limited (hfst library is used by voikko). Enough said? >>> >>>> Ruud Baars<baar...@xs4all.nl>schreef: >>>> >>>>> What about compounding and compounding rules? >>>>> >>>>> Marcin Miłkowski<list-addr...@wp.pl>schreef: >>>>> >>>>>> Hi all, >>>>>> >>>>>> if everything goes well, there should be an alternative speller >>>>>> available for all languages with dictionary-based taggers. I succeeded >>>>>> in porting fsa_spell to Java -- partially, see here: >>>>>> >>>>>> http://morfologik.svn.sourceforge.net/viewvc/morfologik/morfologik-stemming/trunk/morfologik-stemming/src/main/java/morfologik/stemming/Speller.java?revision=374&view=markup >>>>>> >>>>>> I don't know if it will be faster than hunspell but fsa_spell definitely >>>>>> is. I will make comparisons as soon as it is possible (right now I am >>>>>> not sure if the code is really sane). Now, the suggestion algorithm used >>>>>> by fsa_spell is somewhat less capable than in hunspell (for details of >>>>>> the core algorithm, you can see Oflazer's classical paper, >>>>>> http://acl.ldc.upenn.edu/J/J96/J96-1003.pdf), so it was somewhat >>>>>> complemented by special routines for detection of run-on words (like >>>>>> thisis instead of "this is") and for restoration of diacritics. The >>>>>> latter code is obscure and I will implement in a different way, also for >>>>>> the REP and MAP features of hunspell. >>>>>> >>>>>> Now, the finite-state speller does not have to rely on the tagger >>>>>> dictionary at all; there is a conversion code used for another project >>>>>> that can turn the hunspell dictionary into a finite-state machine >>>>>> (http://hfst.svn.sourceforge.net/viewvc/hfst/trunk/conversion-scripts/ ) >>>>>> - but I haven't tried it yet. There is also a shell script that takes >>>>>> several hours to process (on the German dictionary) but spits out a >>>>>> complete wordlist out of the hunspell dictionary. In a nutshell, >>>>>> finite-state speller should be able to recognize all the words contained >>>>>> in the hunspell dictionary, and it would probably suggest most of the >>>>>> corrections that hunspell displays as well. >>>>>> >>>>>> All that does not mean I want to remove hunspell code from LanguageTool; >>>>>> on the contrary; I just think some languages will not need any advanced >>>>>> features of it at all, and we should simply have faster algorithms on >>>>>> offer. >>>>>> >>>>>> Regards, >>>>>> Marcin >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> Live Security Virtual Conference >>>>>> Exclusive live event will cover all the ways today's security and >>>>>> threat landscape has changed and how IT managers can respond. Discussions >>>>>> will include endpoint security, mobile security and the latest in malware >>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>> _______________________________________________ >>>>>> Languagetool-devel mailing list >>>>>> Languagetool-devel@lists.sourceforge.net >>>>>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel >>>>> ------------------------------------------------------------------------------ >>>>> Live Security Virtual Conference >>>>> Exclusive live event will cover all the ways today's security and >>>>> threat landscape has changed and how IT managers can respond. Discussions >>>>> will include endpoint security, mobile security and the latest in malware >>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>> _______________________________________________ >>>>> Languagetool-devel mailing list >>>>> Languagetool-devel@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel >>>> ------------------------------------------------------------------------------ >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. Discussions >>>> will include endpoint security, mobile security and the latest in malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> Languagetool-devel mailing list >>>> Languagetool-devel@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Languagetool-devel mailing list >>> Languagetool-devel@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Languagetool-devel mailing list >> Languagetool-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/languagetool-devel > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel