On Tue, 20 Nov 2007 17:12:05 +0100 Dr. Werner Fink wrote:
> This small C program shows the problem:
> #include <locale.h>
> #include <stdlib.h>
> #include <stdio.h>
> int main()
> {
> char *str = "\\x81";
> wchar_t ret = (wchar_t)0;
> setlocale(LC_CTYPE, "ja_JP.SJIS");
> mbtowc((wchar_t*)0, (char*)0, MB_CUR_MAX);
> mbtowc(&ret, str, MB_CUR_MAX);
> printf("%c (0x%.2X) bytes=%d\n", ret, ret, mblen(str, MB_CUR_MAX));
> printf("%lc (0x%.2X) bytes=%d\n", ret, ret, mblen(str, MB_CUR_MAX));
> return 0;
> }
> the first printf() shows a latin1 Yen whereas the second
> printf() uses due the `l' modifier the wcrtomb() to return
> the old backslash. At least this happens with the glibc.
our linux.i386-64 only has
ja_JP
ja_JP.eucjp
ja_JP.ujis
ja_JP.utf8
the \\x81 was a red herring for me
I modified the test to set the locale from the env
#include <stdio.h>
#include <locale.h>
#include <stdlib.h>
int main()
{
char *str = "\\AZ";
wchar_t ret = (wchar_t)0;
printf("LC_ALL=%s\n", setlocale(LC_ALL, ""));
mbtowc((wchar_t*)0, (char*)0, MB_CUR_MAX);
mbtowc(&ret, str, MB_CUR_MAX);
printf("%s\n", str);
printf("%c (0x%.2X) bytes=%d\n", ret, ret, mblen(str, MB_CUR_MAX));
printf("%lc (0x%.2X) bytes=%d\n", ret, ret, mblen(str, MB_CUR_MAX));
return 0;
}
for all of the locales above I get the same result (modulo the LC_ALL value):
LC_ALL=ja_JP.ujis
\AZ
\ (0x5C) bytes=1
\ (0x5C) bytes=1
> Does the attachment fit your needs?
yes, thanks
_______________________________________________
ast-developers mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-developers