Hi again :)I've chased up a new Vietnamese dictionary being developed by Nguyễn Thái Ngọc Duy, a colleague of mine. He has been working on improving recognition of compound words. Where English adds syllables to create more complex words, Vietnamese adds single-syllable words. Even our simpler 'words' are usually more than word one long, e.g.
screen màn hìnhSpellchecking is difficult in our language anyway, since there are so many viable single-syllable words. It becomes even more complex when a single "word" is expressed by more than one single-syllable word. So if I type
màu hìnhfor "screen", typing a "u" instead of the final "n" in the first word, a spellchecker will not flag that as an error, because "màu" on its own is a valid word in Vietnamese (it means "colour").
So Duy has been working on better recognition of compound words in our language. Unfortunately, he has run into some nasty technical difficulties, and has not had time to solve them. He has pointed me to his word list [1], which may be more current than the standard Hunspell or Aspell Vietnamese wordlist, and has invited us to use it in any way that would benefit OpenOffice.org.
This word list is based on the 2007 list from the Free Vietnamese Dictionary Project [2]. Wiktionary is also based on this project: it is our best dictionary resource.
Ivan, would this newer word list be useful in your dictionary? Could you use it to update the current word list?
I'm keen to help with making sure we have a viable spellchecker in OpenOffice.org 3.0 Vietnamese, but I'm not sure where to start. :S
from Clytie Vietnamese Free Software Translation Team http://vnoss.net/dokuwiki/doku.php?id=projects:l10n [1] http://dev.gentoo.org/~pclouds/files/words-vi-0.1.tar.gz [2] http://www.informatik.uni-leipzig.de/~duc/software/misc/vn_words.zip
PGP.sig
Description: This is a digitally signed message part
