On Sat, Mar 03, 2007 at 04:29:15AM -0700, Kevin Atkinson wrote: > The word list is likely in iso-8859-1 but Aspell expects it in utf-8.
Indeed: # file de* de_affix.dat: ISO-8859 text de_AT.multi: ASCII text de_AT-only.cwl: data de_CH.multi: ASCII text de_CH-only.cwl: data de-common.cwl: data de.dat: ASCII text de_DE.multi: ASCII text de_DE-only.cwl: data de.multi: ASCII text de_phonet.dat: ISO-8859 English text deutsch.alias: ASCII text > Your locale settings _should_ not have an effect here. What does have an > effect is the setting the the language data file "de.dat", in particular > "data-encoding". See > http://aspell.net/man-html/The-Language-Data-File.html >From that page: data-encoding The encoding the language data files are expected to be in as well as the default encoding to use when saving the personal dictionaries. It can be either `utf-8' or any of the 8-bit encoding that Aspell supports. If not set, then it defaults to charset. I hope not to offend, but I found that paragraph a little terse.. * Should it be: "The encoding *of* the language data files"? * "are expected to be in as well as..." Expected to be in what? * Should it be: "as well as the default encoding *used* when saving" Does this mean that aspell expects the word lists to have the same charset as the machine? Isn't that a little odd? de.dat sets 'charset' as iso-8859-1: # cat de.dat # Generated with Aspell Dicts "proc" script version 0.50.1 name de charset iso-8859-1 soundslike de affix de Does aspell not use this to determine the charset? If not, /shouldn't/ it? I just tried /usr/bin/prezip-bin -d < de-common.cwl | /usr/bin/aspell --lang=de create --encoding=iso8859-1 master ./de-common.rws which completed without any errors, producing de-common.rws. As it is quite late here in Japan, I don't have any more time tonight to work on this. A couple of questions: Is this going to conflict with my machines character encoding, or has aspell created an rws file for a utf-8 system? Is the machine character encoding check a feature? It really seems that since one might attemp to install the same wordlist on machines with different character encodings that this is prone to failure. -- \u270C _______________________________________________ Aspell-user mailing list [email protected] http://lists.gnu.org/mailman/listinfo/aspell-user
