On Tue, 20 Nov 2007 17:12:05 +0100 Dr. Werner Fink wrote:
> This small C program shows the problem:

>  #include <locale.h>
>  #include <stdlib.h>
>  #include <stdio.h>
>  int main()
>  {
>          char *str = "\\x81";
>          wchar_t ret = (wchar_t)0;
>          setlocale(LC_CTYPE, "ja_JP.SJIS");
>          mbtowc((wchar_t*)0, (char*)0, MB_CUR_MAX);
>          mbtowc(&ret, str, MB_CUR_MAX);
>          printf("%c (0x%.2X) bytes=%d\n", ret, ret, mblen(str, MB_CUR_MAX));
>          printf("%lc (0x%.2X) bytes=%d\n", ret, ret, mblen(str, MB_CUR_MAX));
>          return 0;
>  }

> the first printf() shows a latin1 Yen whereas the second
> printf() uses due the `l' modifier the wcrtomb() to return
> the old backslash. At least this happens with the glibc.

our linux.i386-64 only has

        ja_JP
        ja_JP.eucjp
        ja_JP.ujis
        ja_JP.utf8

the \\x81 was a red herring for me
I modified the test to set the locale from the env

 #include <stdio.h>
 #include <locale.h>
 #include <stdlib.h>
 int main()
 {
         char *str = "\\AZ";
         wchar_t ret = (wchar_t)0;
         printf("LC_ALL=%s\n", setlocale(LC_ALL, ""));
         mbtowc((wchar_t*)0, (char*)0, MB_CUR_MAX);
         mbtowc(&ret, str, MB_CUR_MAX);
         printf("%s\n", str);
         printf("%c (0x%.2X) bytes=%d\n", ret, ret, mblen(str, MB_CUR_MAX));
         printf("%lc (0x%.2X) bytes=%d\n", ret, ret, mblen(str, MB_CUR_MAX));
         return 0;
 }

for all of the locales above I get the same result (modulo the LC_ALL value):

LC_ALL=ja_JP.ujis
\AZ
\ (0x5C) bytes=1
\ (0x5C) bytes=1

> Does the attachment fit your needs?

yes, thanks

_______________________________________________
ast-developers mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-developers

Reply via email to