On Wednesday 19 January 2005 22:20, Petter Reinholdtsen wrote: > Is there some charset problem? I looked at the > unknown words for nb, and "g�r" and "n�r" are definitely not unknown > words in the dictionary.
I see the same kind of problem with Dutch.
The unknown wordlist shows 'Brazilië', which is 'Brazili�' in UTF-8
(Dutch for Brazil).
I've just checked the a-spell Dutch wordlist and Brazili� _is_ included.
$ aspell dump master /usr/lib/aspell/dutch | grep "Brazil"
Braziliaanse
Braziliaans
Braziliaan
Brazilianen
Brazili�
It looks like the dump prints a ISO-8859-1 coded list.
I think the manpage for aspell gives the answer:
<quote>
--encoding=string
The encoding the input text is in. Valid values are ``utf-8'',
``iso8859-*'', ``koi8-r'', ``viscii'', ``cp1252'', ``machine
!! unsigned 16'', ``machine unsigned 32''. However, the Aspell
!! utility will currently only function correctly with 8-bit encod-
!! ings. utf-8 support is planned for the future. The two ``machine
unsigned'' encodings are intended to be used by other programs
using the Aspell library and it is unlikely the Aspell utility
will ever support these encodings.
</quote>
So it looks as if you may have to iconv the files before you test them
(or, even better, patch aspell so it supports utf-8 ;-)
pgpeQh10ahzEz.pgp
Description: PGP signature

