On Saturday, December 3, 2005 at 10:39:57 +0800, WANG Xu wrote: >| $ export LC_ALL=
The variable still exists, with an empty content. This should not (in theory), but could (in practice), perturbate things. To really remove those variables from environment do: | $ unset LC_ALL LC_CTYPE LANGUAGE >| LANG=zh_CN.GBK >| LC_CTYPE=C > It did not segfault, and print the iso char as: >| Rog\250\246rio Brito Steps: -1) Semething converted the E9 char from Latin-1 to GBK. -2) Mutt asked libc if the result is printable in current C locale. -3) Libc replied that A8 and A6 are not printable in US-Ascii. -4) Therefore Mutt printed octal escapes \250\246. The important point here is: How did original Latin-1 get converted to GBK. I mean: How could possibly Mutt know you use a GBK terminal. There are 2 cases: -a) You force "set charset=GBK" in muttrc: Remove this line. $charset must be free to follow the current locale, whatever it is. -b) Some filter does the conversion. Which one? > Did this mean 1) The library think this char is in GBK charset. Well a discrepency: Part thinks it's good GBK, part it's invalid US-Ascii. In a correct C setup, the display should be masked by one "?" (because the E9 byte does not exist in US-Ascii): | Rog?rio Brito > 2) This char cannot be found in my GBK fonts. It surely exists in font, because at shell you can see a "lowercase e with an acute accent" (a small slash above the e letter) doing: | printf "\xA8\xA6" BTW: On your GBK terminal, is it a "half-wide" character as Ascii e, or full-width as Chinese chars? > On Fri, Dec 02, 2005 at 08:40:19PM +0100, Alain Bench wrote: >> Character U+00BF not existing in GBK, therefore masked by a question >> mark for you on display, and quoted in your reply. In these >> conditions, it didn't segfault. > Sorry for the annoying. And hope we can solve it. Not annoying at all: This fact brings 2 small informations to the puzzle. Precious, as any other precisions you give us. I still don't get the full picture, since my best so far "invalid multibyte" hypothesis seems dead. Let's dig a little more, if you accept. On Saturday, December 3, 2005 at 10:46:03 +0800, WANG Xu wrote: >>| $ printf "Rog\xA8\xA6rio\n" >>| Rog[e acute]rio > This line sent by myself will lead to segfault when I try to review it While both your mails were perfectly valid UTF-8 mails. So the segfault is not only with ISO-8859-1 chars. Yet another small piece for the puzzle. :-) Hum... Random idea. Could you check: | $ printf "Rog\xE9rio\n" | iconv -f l1 -t wchar_t | iconv -f wchar_t -t gbk Bye! Alain. -- When you want to reply to a mailing list, please avoid doing so from a digest. This often builds incorrect references and breaks threads. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]