On Tue, Nov 20, 2007 at 10:31:02AM -0500, Glenn Fowler wrote:
>
> On Tue, 20 Nov 2007 15:55:00 +0100 Dr. Werner Fink wrote:
> > The problem was that the former version of mbchar() uses mbtowc()
> > even for real ASCII characters which shifts the backslash (0x5C)
> > to the latin1 Yen symbol (0xA5).
>
> does the backslash => latin1 Yen symbol happen because the ksh
> implementation is in an incorrect shift state?
> otherwise I don't understand why a pointer to a 7 bit ascii char
> should be converted (my understanding of shift charsets is minimal)
This small C program shows the problem:
#include <locale.h>
#include <stdlib.h>
#include <stdio.h>
int main()
{
char *str = "\\x81";
wchar_t ret = (wchar_t)0;
setlocale(LC_CTYPE, "ja_JP.SJIS");
mbtowc((wchar_t*)0, (char*)0, MB_CUR_MAX);
mbtowc(&ret, str, MB_CUR_MAX);
printf("%c (0x%.2X) bytes=%d\n", ret, ret, mblen(str, MB_CUR_MAX));
printf("%lc (0x%.2X) bytes=%d\n", ret, ret, mblen(str, MB_CUR_MAX));
return 0;
}
the first printf() shows a latin1 Yen whereas the second
printf() uses due the `l' modifier the wcrtomb() to return
the old backslash. At least this happens with the glibc.
> > The patch also includes a test
> > case for Japanese SHIFT-JIS characters which include an ASCII
> > character as second byte.
>
> could you publish the tests under CPL
> otherwise we won't be able to package them with ast
Does the attachment fit your needs?
Werner
--
Dr. Werner Fink <[EMAIL PROTECTED]>
SuSE LINUX Products GmbH, Maxfeldstrasse 5, Nuernberg, Germany
GF: Markus Rex, HRB 16746 (AG Nuernberg)
phone: +49-911-740-53-0, fax: +49-911-3206727, www.opensuse.org
------------------------------------------------------------------
"Having a smoking section in a restaurant is like having
a peeing section in a swimming pool." -- Edward Burr
########################################################################
# #
# Copyright (c) 2007 SuSE Linux Products GmbH, Nuernberg, Germany #
# #
# This library is licensed under the Common Public License, #
# Version 1.0. A copy of the License is available at #
# http://www.opensource.org/licenses/cpl1.0.txt #
# (with md5 checksum 059e8cd6165cb4c31e351f2b69388fd9) #
# #
# Alternatively, this software may be distributed under the terms #
# of the GNU Lesser General Public License version 2.1 as published #
# by the Free Software Foundation. #
# #
# This library is distributed in the hope that it will be useful, #
# but WITHOUT ANY WARRANTY; without even the implied warranty of #
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. #
# #
# Author: Werner Fink <[EMAIL PROTECTED]> #
# #
########################################################################
#
# Byte ranges for Shift-JIS encoding (hexadecimal):
# First byte: 81-9F, E0-EF
# Second byte: 40-7E, 80-FC
#
# Now test out some multi byte characters which
# include 7bit aka ASCII bytes with 0x81 0x{40-7E}
#
typeset -i chr=0
typeset -i err=0
typeset printf=$(type -p printf 2>/dev/null)
unset LC_ALL
unset LC_CTYPE
export LANG=ja_JP.SJIS
for second in $(seq 64 126); do
: $((chr++))
second=$(printf '%x' ${second})
mbchar="$(printf "\x81\x${second}")"
if test -z "${mbchar}" ; then
: $((err++)) # ERROR in builtin printf
continue
fi
if test -x "${printf}" ; then
if test $(${printf} "\x81\x${second}") != ${mbchar} ; then
: $((err++)) # ERROR in builtin printf
continue
fi
fi
uq=$(echo ${mbchar})
dq=$(echo "${mbchar}")
test "$uq" != "$dq" && let err+=1
test ${#uq} -ne 1 -o ${#dq} -ne 1 && let err+=1
done
if test $err -ne 0 ; then
: err_exit
: err_exit
print -u2 -n "\t"
print -u2 -r ${0##*/}[$LINENO]: "Shift-JIS encoding failed"
fi
exit $err
_______________________________________________
ast-developers mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-developers