On Tue, Nov 20, 2007 at 10:31:02AM -0500, Glenn Fowler wrote:
> 
> On Tue, 20 Nov 2007 15:55:00 +0100 Dr. Werner Fink wrote:
> > The problem was that the former version of mbchar() uses mbtowc()
> > even for real ASCII characters which shifts the backslash (0x5C)
> > to the latin1 Yen symbol (0xA5). 
> 
> does the backslash => latin1 Yen symbol happen because the ksh
> implementation is in an incorrect shift state?
> otherwise I don't understand why a pointer to a 7 bit ascii char
> should be converted (my understanding of shift charsets is minimal)

This small C program shows the problem:

 #include <locale.h>
 #include <stdlib.h>
 #include <stdio.h>
 int main()
 {
         char *str = "\\x81";
         wchar_t ret = (wchar_t)0;
         setlocale(LC_CTYPE, "ja_JP.SJIS");
         mbtowc((wchar_t*)0, (char*)0, MB_CUR_MAX);
         mbtowc(&ret, str, MB_CUR_MAX);
         printf("%c (0x%.2X) bytes=%d\n", ret, ret, mblen(str, MB_CUR_MAX));
         printf("%lc (0x%.2X) bytes=%d\n", ret, ret, mblen(str, MB_CUR_MAX));
         return 0;
 }

the first printf() shows a latin1 Yen whereas the second
printf() uses due the `l' modifier the wcrtomb() to return
the old backslash. At least this happens with the glibc.

> > The patch also includes a test
> > case for Japanese SHIFT-JIS characters which include an ASCII
> > character as second byte.
> 
> could you publish the tests under CPL
> otherwise we won't be able to package them with ast

Does the attachment fit your needs?

     Werner

-- 
 Dr. Werner Fink <[EMAIL PROTECTED]>
 SuSE LINUX Products GmbH,  Maxfeldstrasse 5,  Nuernberg,  Germany
 GF: Markus Rex,  HRB 16746 (AG Nuernberg)
 phone: +49-911-740-53-0,  fax: +49-911-3206727,  www.opensuse.org
------------------------------------------------------------------
  "Having a smoking section in a restaurant is like having
          a peeing section in a swimming pool." -- Edward Burr
########################################################################
#                                                                      #
#   Copyright (c) 2007 SuSE Linux Products GmbH, Nuernberg, Germany    #
#                                                                      #
#   This library is licensed under the Common Public License,          #
#   Version 1.0.  A copy of the License is available at                #
#   http://www.opensource.org/licenses/cpl1.0.txt                      #
#   (with md5 checksum 059e8cd6165cb4c31e351f2b69388fd9)               #
#                                                                      #
#   Alternatively, this software may be distributed under the terms    #
#   of the GNU Lesser General Public License version 2.1 as published  #
#   by the Free Software Foundation.                                   #
#                                                                      #
#   This library is distributed in the hope that it will be useful,    #
#   but WITHOUT ANY WARRANTY; without even the implied warranty of     #
#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.               #
#                                                                      #
#   Author: Werner Fink <[EMAIL PROTECTED]>                               #
#                                                                      #
########################################################################

#
# Byte ranges for Shift-JIS encoding (hexadecimal):
# First byte:   81-9F, E0-EF
# Second byte:  40-7E, 80-FC
#
# Now test out some multi byte characters which
# include 7bit aka ASCII bytes with 0x81 0x{40-7E}
#

typeset -i chr=0
typeset -i err=0
typeset printf=$(type -p printf 2>/dev/null)

unset LC_ALL
unset LC_CTYPE
export LANG=ja_JP.SJIS

for second in $(seq 64 126); do
    : $((chr++))
    second=$(printf '%x' ${second})
    mbchar="$(printf "\x81\x${second}")"
    if test -z "${mbchar}" ; then
        : $((err++))            # ERROR in builtin printf
        continue
    fi
    if test -x "${printf}" ; then
        if test $(${printf} "\x81\x${second}") != ${mbchar} ; then
            : $((err++))        # ERROR in builtin printf
            continue
        fi
    fi
    uq=$(echo ${mbchar})
    dq=$(echo "${mbchar}")
    test "$uq" != "$dq" && let err+=1
    test ${#uq} -ne 1 -o ${#dq} -ne 1 && let err+=1
done

if test $err -ne 0 ; then
    : err_exit
    : err_exit
    print -u2 -n "\t"
    print -u2 -r ${0##*/}[$LINENO]: "Shift-JIS encoding failed"
fi
exit $err
_______________________________________________
ast-developers mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-developers

Reply via email to