Lars Aronsson wrote: > A friend tried to build Aspell's Swedish dictionary (aspell-sv-1.3.8) > on Red Hat Linux 9. Apparently, this Linux distribution sets > LANG=en_US.UTF-8 by default and the dictionary is written in > ISO8859-1. This is what happened: > > aspell-sv-1.3.8> make all > ./unsq < words-sv.sq | aspell --local-data-dir=./ --lang=svenska > create master ./svenska > Malformed UTF-8 character (unexpected end of string) at ./unsq line > 51, <stdin> line 7. [...]
> The matter here is not en_US or sv_SE, but the fact that > some ISO8859-1 characters are interpreted as prefixes for > UTF-8 two byte sequences. I think the Makefile should > explicitly set the LANG and/or LC_CTYPE environment > variables before running "unsq" and "aspell". This will happen from next time I package a version of "aspell-sv". > Is there an official Aspell policy to move to UTF-8? Don't know. But Aspell already demands that a dictionary contains information on which character encoding is used for it. > Does the Swedish/Danish dictionary team at SSLUG any > decision on this? Not really. But we have considered to switch from ISO-8859-1 to ISO-10646/UTF-8. Jacob -- "Human beings just can't not communicate." _______________________________________________ Aspell-devel mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/aspell-devel