UTF-8 spell checking

Markus Kuhn Wed, 24 Mar 2004 11:46:13 -0800

>From the GNU aspell author Kevin Atkinson <kevina at gnu.org>:

Concerning Aspell and UTF-8:


Starting with version 0.60, Aspell fully supports spell checking documents
in UTF-8 or any other encoding that Aspell supports.  The fact that Aspell
is still 8-bit internally can be made completely transparent to the end
user.  This means that Aspell can now support any language that has no
more than 220 distinct characters, including different capitalizations and
accents, _even if_ there is not an existing 8-bit encoding that supports
the language.  All one has to do is creating a new character data file
which is a fairly simple task.  The internal encoding never has to be seen
by the end-user, including the word list author, since not even the word
list has to be in the same encoding that Aspell uses.

GNU Aspell 0.50 supported Unicode to some extent; however, word
list still had to be in an 8-bit character set.  Furthermore, spell
checking documents in an encoding that is different from the internal
encoding was pragmatic.

Full UTF-8 support was added with 0.51-20040219, the next snapshot,
0.51-20040227 fixed a few bugs, while the latest 0.60-20040317 uses a new,
simpler, format for the character data files.

Aspell snapshots can be downloaded from ftp://alpha.gnu.org/gnu/aspell/.

Markus

-- 
Markus Kuhn, Computer Laboratory, University of Cambridge
http://www.cl.cam.ac.uk/~mgk25/ || CB3 0FD, Great Britain


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

UTF-8 spell checking

Reply via email to