A friend tried to build Aspell's Swedish dictionary (aspell-sv-1.3.8)
on Red Hat Linux 9.  Apparently, this Linux distribution sets
LANG=en_US.UTF-8 by default and the dictionary is written in
ISO8859-1.  This is what happened:

aspell-sv-1.3.8> make all
./unsq < words-sv.sq | aspell --local-data-dir=./ --lang=svenska
create master ./svenska
Malformed UTF-8 character (unexpected end of string) at ./unsq line
51, <stdin> line 7.
Malformed UTF-8 character (unexpected end of string) at ./unsq line
51, <stdin> line 105.
Malformed UTF-8 character (unexpected end of string) at ./unsq line
51, <stdin> line 106.

which of course is quite confusing to a novice.

The matter here is not en_US or sv_SE, but the fact that some
ISO8859-1 characters are interpreted as prefixes for UTF-8 two byte
sequences.  I think the Makefile should explicitly set the LANG and/or
LC_CTYPE environment variables before running "unsq" and "aspell".
(Unsq or un-squeeze is a Perl script distributed with aspell-sv-1.3.8)

Is there an official Aspell policy to move to UTF-8?  Does the
Swedish/Danish dictionary team at SSLUG any decision on this?
It appears that Red Hat Linux has taken this move already in version 8,
http://www.redhat.com/docs/manuals/linux/RHL-8.0-Manual/release-notes/x86/


-- 
  Lars Aronsson ([EMAIL PROTECTED])
  Aronsson Datateknik - http://aronsson.se/



_______________________________________________
Aspell-devel mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/aspell-devel

Reply via email to