A friend tried to build Aspell's Swedish dictionary (aspell-sv-1.3.8) on Red Hat Linux 9. Apparently, this Linux distribution sets LANG=en_US.UTF-8 by default and the dictionary is written in ISO8859-1. This is what happened:
aspell-sv-1.3.8> make all ./unsq < words-sv.sq | aspell --local-data-dir=./ --lang=svenska create master ./svenska Malformed UTF-8 character (unexpected end of string) at ./unsq line 51, <stdin> line 7. Malformed UTF-8 character (unexpected end of string) at ./unsq line 51, <stdin> line 105. Malformed UTF-8 character (unexpected end of string) at ./unsq line 51, <stdin> line 106. which of course is quite confusing to a novice. The matter here is not en_US or sv_SE, but the fact that some ISO8859-1 characters are interpreted as prefixes for UTF-8 two byte sequences. I think the Makefile should explicitly set the LANG and/or LC_CTYPE environment variables before running "unsq" and "aspell". (Unsq or un-squeeze is a Perl script distributed with aspell-sv-1.3.8) Is there an official Aspell policy to move to UTF-8? Does the Swedish/Danish dictionary team at SSLUG any decision on this? It appears that Red Hat Linux has taken this move already in version 8, http://www.redhat.com/docs/manuals/linux/RHL-8.0-Manual/release-notes/x86/ -- Lars Aronsson ([EMAIL PROTECTED]) Aronsson Datateknik - http://aronsson.se/ _______________________________________________ Aspell-devel mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/aspell-devel