Hi!

----

I found an issue in ksh93s-_20060912 on Solaris 11/B48/i386 which may be
related to the substring operator "${strvar:index:ressize}": It seems
the operator has problems to handle multibyte characters correctly.

The following testcase...
-- snip --
(TESTSHELL=/path/to_shell_which_should_be_tested ; export
LC_ALL=en_US.UTF-8 LANG=en_US.UTF-8 ; $TESTSHELL -c 'cat /usr/pub/UTF-8
| while read i ; do echo "a=$(printf "%s\n" "$i" | /usr/bin/wc -m)
b=${#i} c=$( (for (( ci=0 ; ci < ${#i} ; ci++ )) ; do printf "%s"
"${i:$ci:1}" ; done) | /usr/bin/wc -m)" ; done') | head -20
-- snip --
("/usr/pub/UTF-8" is a sample file which contains a large range of
unicode characters encoded in UTF-8)

...returns the following output when
"TESTSHELL=/path/to_shell_which_should_be_tested" is replaced by
/bin/bash ("GNU bash, version 3.00.16(1)-release
(i386-pc-solaris2.11)"):
-- snip --
a=      73 b=72 c=      72
a=      73 b=72 c=      72
a=      72 b=71 c=      71
a=      72 b=71 c=      71
a=      72 b=71 c=      71
a=      71 b=70 c=      70
a=      72 b=71 c=      71
a=      74 b=73 c=      73
a=      74 b=73 c=      73
a=      74 b=73 c=      73
a=      72 b=71 c=      71
a=      72 b=71 c=      71
a=      72 b=71 c=      71
a=      72 b=71 c=      71
a=      72 b=71 c=      71
a=      72 b=71 c=      71
a=      72 b=71 c=      71
a=      72 b=71 c=      71
a=      70 b=69 c=      69
a=      70 b=69 c=      69
-- snip --
(this is AFAIK the expected behaviour)

ksh93s-_20060912 returns a different output:
-- snip --
a=      73 b=72 c=      72
a=      73 b=72 c=      72
a=      72 b=71 c=      71
a=      72 b=71 c=      71
a=      72 b=71 c=      71
a=      71 b=70 c=      70
a=      72 b=71 c=      71
a=      74 b=73 c=      73
a=      74 b=73 c=      73
a=      74 b=73 c=      73
a=      72 b=71 c=      59
a=      72 b=71 c=      59
a=      72 b=71 c=      59
a=      72 b=71 c=      59
a=      72 b=71 c=      59
a=      72 b=71 c=      59
a=      72 b=71 c=      59
a=      72 b=71 c=      59
a=      70 b=69 c=      57
a=      70 b=69 c=      57
-- snip --

Note the values at the end - as soon as multibyte characters are read
from /usr/pub/UTF-8 the values start to be wrong (I have uploaded a
bzip2'ed version of /usr/pub/UTF-8 to
http://www.opensolaris.org/os/project/ksh93-integration/downloads/solaris_11_b48__usr_pub_UTF-8.bz2
for testing purposes).

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) roland.mainz at nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 7950090
 (;O/ \/ \O;)

Reply via email to