On Saturday, December 3, 2005 at 10:39:57 +0800, WANG Xu wrote:

>| $ export LC_ALL=

    The variable still exists, with an empty content. This should not
(in theory), but could (in practice), perturbate things. To really
remove those variables from environment do:

| $ unset LC_ALL LC_CTYPE LANGUAGE


>| LANG=zh_CN.GBK
>| LC_CTYPE=C
> It did not segfault, and print the iso char as:
>| Rog\250\246rio Brito

    Steps:

 -1) Semething converted the E9 char from Latin-1 to GBK.
 -2) Mutt asked libc if the result is printable in current C locale.
 -3) Libc replied that A8 and A6 are not printable in US-Ascii.
 -4) Therefore Mutt printed octal escapes \250\246.

    The important point here is: How did original Latin-1 get converted
to GBK. I mean: How could possibly Mutt know you use a GBK terminal.
There are 2 cases:

 -a) You force "set charset=GBK" in muttrc: Remove this line. $charset
must be free to follow the current locale, whatever it is.

 -b) Some filter does the conversion. Which one?


> Did this mean 1) The library think this char is in GBK charset.

    Well a discrepency: Part thinks it's good GBK, part it's invalid
US-Ascii. In a correct C setup, the display should be masked by one "?"
(because the E9 byte does not exist in US-Ascii):

| Rog?rio Brito


> 2) This char cannot be found in my GBK fonts.

    It surely exists in font, because at shell you can see a "lowercase
e with an acute accent" (a small slash above the e letter) doing:

| printf "\xA8\xA6"

    BTW: On your GBK terminal, is it a "half-wide" character as Ascii e,
or full-width as Chinese chars?


> On Fri, Dec 02, 2005 at 08:40:19PM +0100, Alain Bench wrote:
>> Character U+00BF not existing in GBK, therefore masked by a question
>> mark for you on display, and quoted in your reply. In these
>> conditions, it didn't segfault.
> Sorry for the annoying. And hope we can solve it.

    Not annoying at all: This fact brings 2 small informations to the
puzzle. Precious, as any other precisions you give us. I still don't get
the full picture, since my best so far "invalid multibyte" hypothesis
seems dead. Let's dig a little more, if you accept.


 On Saturday, December 3, 2005 at 10:46:03 +0800, WANG Xu wrote:

>>| $ printf "Rog\xA8\xA6rio\n"
>>| Rog[e acute]rio
> This line sent by myself will lead to segfault when I try to review it

    While both your mails were perfectly valid UTF-8 mails. So the
segfault is not only with ISO-8859-1 chars. Yet another small piece for
the puzzle. :-)


    Hum... Random idea. Could you check:

| $ printf "Rog\xE9rio\n" | iconv -f l1 -t wchar_t | iconv -f wchar_t -t gbk


Bye!    Alain.
-- 
When you want to reply to a mailing list, please avoid doing so from a
digest. This often builds incorrect references and breaks threads.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to