On Tue, Nov 20, 2007 at 11:57:02AM -0500, Glenn Fowler wrote:
>
> On Tue, 20 Nov 2007 17:12:05 +0100 Dr. Werner Fink wrote:
> > This small C program shows the problem:
>
> > #include <locale.h>
> > #include <stdlib.h>
> > #include <stdio.h>
> > int main()
> > {
> > char *str = "\\x81";
> > wchar_t ret = (wchar_t)0;
> > setlocale(LC_CTYPE, "ja_JP.SJIS");
> > mbtowc((wchar_t*)0, (char*)0, MB_CUR_MAX);
> > mbtowc(&ret, str, MB_CUR_MAX);
> > printf("%c (0x%.2X) bytes=%d\n", ret, ret, mblen(str, MB_CUR_MAX));
> > printf("%lc (0x%.2X) bytes=%d\n", ret, ret, mblen(str,
> > MB_CUR_MAX));
> > return 0;
> > }
>
> > the first printf() shows a latin1 Yen whereas the second
> > printf() uses due the `l' modifier the wcrtomb() to return
> > the old backslash. At least this happens with the glibc.
>
> our linux.i386-64 only has
>
> ja_JP
> ja_JP.eucjp
Also called ja_JP.eucJP on glibc based systems.
> ja_JP.ujis
Guess: this is equivalent with ja_JP.sjis or ja_JP.SJIS
on a glibc based system.
> ja_JP.utf8
Also called ja_JP.UTF8 or ja_JP.UTF-8 on glibc based systems.
> the \\x81 was a red herring for me
> I modified the test to set the locale from the env
>
> #include <stdio.h>
> #include <locale.h>
> #include <stdlib.h>
> int main()
> {
> char *str = "\\AZ";
> wchar_t ret = (wchar_t)0;
> printf("LC_ALL=%s\n", setlocale(LC_ALL, ""));
> mbtowc((wchar_t*)0, (char*)0, MB_CUR_MAX);
> mbtowc(&ret, str, MB_CUR_MAX);
> printf("%s\n", str);
> printf("%c (0x%.2X) bytes=%d\n", ret, ret, mblen(str, MB_CUR_MAX));
> printf("%lc (0x%.2X) bytes=%d\n", ret, ret, mblen(str, MB_CUR_MAX));
> return 0;
> }
>
> for all of the locales above I get the same result (modulo the LC_ALL value):
>
> LC_ALL=ja_JP.ujis
> \AZ
> \ (0x5C) bytes=1
> \ (0x5C) bytes=1
Here I get
LC_ALL=ja_JP.SJIS
\AZ
¥ (0xA5) bytes=1
\ (0xA5) bytes=1
... it seems that the `%c' is not equivalent with `%lc' on
a glibc based system.
If you prefere the old version of the mbchar() macro I'd sugesst to
use something like
#if defined(__linux__)
... glibc version of mbchar()
#else
... old version of mbchar()
#endif
on the other hand the glibc version should not hurt to much :)
For the script I'd like to mention that the locale which is used
for Shift-JIS should to be detected. Maybe something like
case $(uname -o) in
*Linux*) LC_CTPYE=ja_JP.SJIS ;;
*Solaris*) LC_CTPYE=ja_JP.PCK ;;
*xxx*) LC_CTPYE=ja_JP.ujis ;;
*) exit 0
esac
export LC_CTPYE
could help. On a glibc based system the program `locale' may
help with `locale -a | grep -i jis' but I don't know if this
is standard on most UNICES around.
And the `$(seq ..)' should be replaced by a arithmetic expresion
as mentioned by Roland.
Werner
--
Dr. Werner Fink <[EMAIL PROTECTED]>
SuSE LINUX Products GmbH, Maxfeldstrasse 5, Nuernberg, Germany
GF: Markus Rex, HRB 16746 (AG Nuernberg)
phone: +49-911-740-53-0, fax: +49-911-3206727, www.opensuse.org
------------------------------------------------------------------
"Having a smoking section in a restaurant is like having
a peeing section in a swimming pool." -- Edward Burr
_______________________________________________
ast-developers mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-developers