Dear Asmo, Yes, 12239 words are really almost nothing for a language like Finnish. My personal expertise is, that below 100 thousand words spell cheking is very poor.
I would first exclude the words Soikko has. You need a healthy word collection. Then you must set up a word tree. The Hungarian word tree contains: - nomen - nomens with genitiv ending ja-je - nomens with genitiv ending se - nomens with �,� but to be conjugated with e like f� - nomens deep - nomen-names - nomen geographical - nomen morpological typ1 - nomen morpological typ2 - verbs with object - verbs without object - verbs morpological typ1 - verbs morpological typ2 - adjective - adjectives with genitiv ending ja-je - adjectives with genitiv ending se - adjectives deep - adjectives with �,� but to be conjugated with e - adjectives morpological typ1 - adjectives morpological typ2 the above mentioned names are files containing words with the above mentioned word classes. Morphological types are in fact exception classes. If you have a little starting tree, start putting together the possible endings, first for verbs, which are the simpler ones, then for nomen and adjektives, which are almost the same. Then write awk or perl scripts (or something similar), to generate the affixes and the flags. The Hungarian tree uses m4 macros to keep in the same place the similar types of endings. For affix and flag building it uses awk. If you have put something together, check the result by using something similar, as the unmunch program of myspell, to see the results. You can find the Hungarian tree on magyarispell.sf.net. When you have put together something, a little starting tree, I can look into it, if it looks useful. Please write, if you are so far. I cannot Finnish, but I know the Hungarian tree, and that might be helpful for you. I think, the selection of c++ for the affix generator is not too lucky. The Estonian one uses perl, which I believe, is more effective, since interpretative. It is a very interesting project, I wish you a lot of joy with that: Eleonora Am Mittwoch, 16. Februar 2005 10:33 schrieb Asmo Koskinen: > On Friday 11 February 2005 19:21, eleonora wrote: > > Enhanced myspell covers really everything. You just need to prepare the > > right affixes, which is - admittedly- not easy, but it can be done. > > > > Good luck, Eleonora > > Eleonora, > > it's seems that you are right ;-). Pauli Virtanen, who has made that > tmispell-program for Soikko, answered to me another postlist. He says, that > problem is not affix rules, but lacking of the Finnish words. He says, that > there is now only 12239 Finnish words for affix rules. Soikko has about 100 > 000 words!? > > Pauli has developed ispell for Finnish very long ago (September 2000): > > http://ispell-fi.sourceforge.net/CHANGELOG > http://ispell-fi.sourceforge.net/index.html > > Elenora please, > > what I have to do to start Finnish Myspell Project. I have read this page: > > http://lingucomponent.openoffice.org/dictionary.html > > What else I have to know to start? > What are caveats for Myspell Project? > > Best regards, Asmo Koskinen. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
