April Chin wrote:
> > April Chin wrote:
> > > > ksh93 already has support for I18N and for multibyte character
> > > > handling.  If there is need for change, it should be minimal
> > > > since currently all error message translation goes through a single
> > > > interface.
> > > >
> > > > The multibyte character handling uses the POSIX mb*() interface.
> > > I believe in ksh93 this may not be working in all respects.
> > > An i18n engineer and I tried a few manual tests on ksh93 with
> > > multibyte characters, and ksh93 did not appear to recognize some of
> > > them which may have included an ASCII byte within the multibyte character.
> >
> > April - did this problem occur only in interactive terminal mode or even
> > when a script writes the japanese/ASCII text mixture (e.g. % cat
> > "ksh93_echo_japanese.ksh93" | ksh # ) ?
> 
> The test was done via a ksh93 script.
> On a system with the Japanese locales installed, I set up
> an appropriate locale:
> 
> % setenv LANG ja_JP.PCK

You did start a new shell instance before entering the  % cat sjis.dat |
test.sh # -sequence below, right ? AFAIK ksh88 was not able to handle
sudden changes in the locale correctly...

> I input a file (sjis.dat, attached below)

Thanks for the testcase ! :-)

> containing multibyte
> characters, including ones with an ASCII byte, to a ksh93 script which
> reads and echoes each line, and then each word in that line.
> 
> % cat sjis.dat | test.sh
> 
> where test.sh contains:
> #!/usr/bin/ksh93
> read a
> echo $a
> 
> for b in $a ;
> do
>         echo $b;
> done
> 
> Doing the same test with the current Solaris ksh, instead of ksh93,
> output all the characters as expected.  ksh93 was able to process
> 2 out of 3 multibyte characters which contained an ASCII component.

CC:'ing Glenn Fowler <gsf at research.att.com> to have a look at the
problem...
... Glenn - do you have any idea what may be wrong here ?

> > Linux may suffer from a similar problem, please read
> > https://mailman.research.att.com/pipermail/ast-users/2006q1/000838.html
> > and
> > https://mailman.research.att.com/pipermail/ast-users/2006q1/000839.html
> >
> > BTW: Which terminal emulator did you use ? Gnome terminal, kconsole,
> > dtterm or xterm ?
> 
> The i18n engineer provided me with a terminal emulator for which
> I could turn on sjis mode, the newer mode which will
> accept multibyte characters which may have an ASCII component.
> It sounds like the Linux problem is related to the terminal emulator.

AFAIK this is unlikely since bash3 on SuSE 10.0 doesn't show the problem
for the same testcase - it seems that the issue hides somewhere in the
ksh93 code or that the glibc multibyte/widechar code is buggy somehow
(AFAIK the old bash versions had special codepaths for UTf-8 handling
which bypass normal multibyte/widechar handling... but I could be wrong
here...) ...

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) roland.mainz at nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 7950090
 (;O/ \/ \O;)

Reply via email to