We'd love to collaborate with anybody else interested in it, or to get feedback on our approach.
Here's some background, and then our approach, if you are interested.
Turkish is an "agglutinative" language, like Finnish, Estonian, Hungarian, Japanese, and Korean. That means that suffixes convey a lot more information than in Indo-european languages, and that any complete list of "surface forms" of words has to be enormously longer. Though the suffix trees are big, they're quite regular, so it fits reasonably well into Aspell's structure (though it fits better into Hunspell, but for various reasons we can't go there). There's a good implementation of Aspell for Finnish which proves the concept.
We hope to take the existing Turkish Aspell word list, or maybe even a longer word list, if we have time to generate it, and apply a stemmer to it to come of with a list of the represented stem forms. We'll connect those up with tables of suffixes we've collected from the web.
Does that sound like it will work?
_______________________________________________ Aspell-user mailing list [email protected] http://lists.gnu.org/mailman/listinfo/aspell-user
