On Tue, Nov 20, 2007 at 11:57:02AM -0500, Glenn Fowler wrote:
> 
> On Tue, 20 Nov 2007 17:12:05 +0100 Dr. Werner Fink wrote:
> > This small C program shows the problem:
> 
> >  #include <locale.h>
> >  #include <stdlib.h>
> >  #include <stdio.h>
> >  int main()
> >  {
> >          char *str = "\\x81";
> >          wchar_t ret = (wchar_t)0;
> >          setlocale(LC_CTYPE, "ja_JP.SJIS");
> >          mbtowc((wchar_t*)0, (char*)0, MB_CUR_MAX);
> >          mbtowc(&ret, str, MB_CUR_MAX);
> >          printf("%c (0x%.2X) bytes=%d\n", ret, ret, mblen(str, MB_CUR_MAX));
> >          printf("%lc (0x%.2X) bytes=%d\n", ret, ret, mblen(str, 
> > MB_CUR_MAX));
> >          return 0;
> >  }
> 
> > the first printf() shows a latin1 Yen whereas the second
> > printf() uses due the `l' modifier the wcrtomb() to return
> > the old backslash. At least this happens with the glibc.
> 
> our linux.i386-64 only has
> 
>       ja_JP
>       ja_JP.eucjp

Also called ja_JP.eucJP on glibc based systems.

>       ja_JP.ujis

Guess: this is equivalent with ja_JP.sjis or ja_JP.SJIS
on a glibc based system.

>       ja_JP.utf8

Also called ja_JP.UTF8 or ja_JP.UTF-8 on glibc based systems.

> the \\x81 was a red herring for me
> I modified the test to set the locale from the env
> 
>  #include <stdio.h>
>  #include <locale.h>
>  #include <stdlib.h>
>  int main()
>  {
>          char *str = "\\AZ";
>          wchar_t ret = (wchar_t)0;
>          printf("LC_ALL=%s\n", setlocale(LC_ALL, ""));
>          mbtowc((wchar_t*)0, (char*)0, MB_CUR_MAX);
>          mbtowc(&ret, str, MB_CUR_MAX);
>        printf("%s\n", str);
>          printf("%c (0x%.2X) bytes=%d\n", ret, ret, mblen(str, MB_CUR_MAX));
>          printf("%lc (0x%.2X) bytes=%d\n", ret, ret, mblen(str, MB_CUR_MAX));
>          return 0;
>  }
> 
> for all of the locales above I get the same result (modulo the LC_ALL value):
> 
> LC_ALL=ja_JP.ujis
> \AZ
> \ (0x5C) bytes=1
> \ (0x5C) bytes=1

Here I get

 LC_ALL=ja_JP.SJIS
 \AZ
 ¥ (0xA5) bytes=1
 \ (0xA5) bytes=1

... it seems that the `%c' is not equivalent with `%lc' on
a glibc based system.

If you prefere the old version of the mbchar() macro I'd sugesst to
use something like

  #if defined(__linux__)
  ... glibc version of mbchar()
  #else
  ... old version of mbchar()
  #endif

on the other hand the glibc version should not hurt to much :)

For the script I'd like to mention that the locale which is used
for Shift-JIS should to be detected.  Maybe something like

    case $(uname -o) in
    *Linux*)   LC_CTPYE=ja_JP.SJIS ;;
    *Solaris*) LC_CTPYE=ja_JP.PCK  ;;
    *xxx*)     LC_CTPYE=ja_JP.ujis ;;
    *)         exit 0
    esac
    export LC_CTPYE

could help.  On a glibc based system the program `locale' may
help with `locale -a | grep -i jis' but I don't know if this
is standard on most UNICES around.

And the `$(seq ..)' should be replaced by a arithmetic expresion
as mentioned by Roland.


         Werner

-- 
 Dr. Werner Fink <[EMAIL PROTECTED]>
 SuSE LINUX Products GmbH,  Maxfeldstrasse 5,  Nuernberg,  Germany
 GF: Markus Rex,  HRB 16746 (AG Nuernberg)
 phone: +49-911-740-53-0,  fax: +49-911-3206727,  www.opensuse.org
------------------------------------------------------------------
  "Having a smoking section in a restaurant is like having
          a peeing section in a swimming pool." -- Edward Burr
_______________________________________________
ast-developers mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-developers

Reply via email to