[Aspell-user] Anybody working on Turkish?

Ethan Bradford Wed, 28 Jun 2006 17:34:45 -0700

Unless somebody else is nearly there for Turkish, Gokalp and I will probably be working on improving Aspell for Turkish (that is, we have been working on it, and are just awaiting some administrivia to start working on it again).

We'd love to collaborate with anybody else interested in it, or to get feedback on our approach.

Here's some background, and then our approach, if you are interested.

Turkish is an "agglutinative" language, like Finnish, Estonian, Hungarian, Japanese, and Korean. That means that suffixes convey a lot more information than in Indo-european languages, and that any complete list of "surface forms" of words has to be enormously longer. Though the suffix trees are big, they're quite regular, so it fits reasonably well into Aspell's structure (though it fits better into Hunspell, but for various reasons we can't go there). There's a good implementation of Aspell for Finnish which proves the concept.

We hope to take the existing Turkish Aspell word list, or maybe even a longer word list, if we have time to generate it, and apply a stemmer to it to come of with a list of the represented stem forms. We'll connect those up with tables of suffixes we've collected from the web.

Does that sound like it will work?

_______________________________________________
Aspell-user mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/aspell-user

[Aspell-user] Anybody working on Turkish?

Reply via email to