April Chin wrote: > > April Chin wrote: > > > > ksh93 already has support for I18N and for multibyte character > > > > handling. If there is need for change, it should be minimal > > > > since currently all error message translation goes through a single > > > > interface. > > > > > > > > The multibyte character handling uses the POSIX mb*() interface. > > > I believe in ksh93 this may not be working in all respects. > > > An i18n engineer and I tried a few manual tests on ksh93 with > > > multibyte characters, and ksh93 did not appear to recognize some of > > > them which may have included an ASCII byte within the multibyte character. > > > > April - did this problem occur only in interactive terminal mode or even > > when a script writes the japanese/ASCII text mixture (e.g. % cat > > "ksh93_echo_japanese.ksh93" | ksh # ) ? > > The test was done via a ksh93 script. > On a system with the Japanese locales installed, I set up > an appropriate locale: > > % setenv LANG ja_JP.PCK
You did start a new shell instance before entering the % cat sjis.dat | test.sh # -sequence below, right ? AFAIK ksh88 was not able to handle sudden changes in the locale correctly... > I input a file (sjis.dat, attached below) Thanks for the testcase ! :-) > containing multibyte > characters, including ones with an ASCII byte, to a ksh93 script which > reads and echoes each line, and then each word in that line. > > % cat sjis.dat | test.sh > > where test.sh contains: > #!/usr/bin/ksh93 > read a > echo $a > > for b in $a ; > do > echo $b; > done > > Doing the same test with the current Solaris ksh, instead of ksh93, > output all the characters as expected. ksh93 was able to process > 2 out of 3 multibyte characters which contained an ASCII component. CC:'ing Glenn Fowler <gsf at research.att.com> to have a look at the problem... ... Glenn - do you have any idea what may be wrong here ? > > Linux may suffer from a similar problem, please read > > https://mailman.research.att.com/pipermail/ast-users/2006q1/000838.html > > and > > https://mailman.research.att.com/pipermail/ast-users/2006q1/000839.html > > > > BTW: Which terminal emulator did you use ? Gnome terminal, kconsole, > > dtterm or xterm ? > > The i18n engineer provided me with a terminal emulator for which > I could turn on sjis mode, the newer mode which will > accept multibyte characters which may have an ASCII component. > It sounds like the Linux problem is related to the terminal emulator. AFAIK this is unlikely since bash3 on SuSE 10.0 doesn't show the problem for the same testcase - it seems that the issue hides somewhere in the ksh93 code or that the glibc multibyte/widechar code is buggy somehow (AFAIK the old bash versions had special codepaths for UTf-8 handling which bypass normal multibyte/widechar handling... but I could be wrong here...) ... ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) roland.mainz at nrubsig.org \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 7950090 (;O/ \/ \O;)
