Hi!

----

Fresh from our bug database: The ${str:offset:size}-operator seems to
scale poorly for large str sizes (e.g. putting 4MB of characters in a
multibyte locale (e.g. "en_US.UTF-8) into a string takes forever).

For example:
-- snip --
$ (for ((z=10 ; z < 16 ; z++ )) ; do printf "#### z=%d:\n" z ; export z
; timex ksh93 -c 'integer i len ; s="x" ; for ((i=0 ; i < z ; i++)) ; do
s+="$s" ; done ; len=${#s} ; print "len=$len" ; for ((i=0 ; i < len ;
i++ )) ; do buf="${s:i:2}" ; done' ; done)
#### z=10:
len=1024

real           0.23
user           0.15
sys            0.06

#### z=11:
len=2048

real           0.54
user           0.44
sys            0.09

#### z=12:
len=4096

real           1.71
user           1.62
sys            0.06

#### z=13:
len=8192

real           6.43
user           6.32
sys            0.06

#### z=14:
len=16384

real          28.19
user          27.63
sys            0.09

#### z=15:
len=32768

real        1:44.89
user        1:43.41
sys            0.22
-- snip --
... and so on... for 4MB of data it's getting really really nasty...

I'm currently scratching my head how to solve the problem - there is no
simple way to fetch charcter "x" from a multibyte string without
scanning the string from the beginning and use |mblen()| to walk over
the data...
... would it be possible to change the variable storage system a bit and
create an array of |wchar_t| on demand for string operators like
${str:offset:size} (the the array is kept around as "cache" until
someone writes to the variable) ?

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) [EMAIL PROTECTED]
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 7950090
 (;O/ \/ \O;)
_______________________________________________
ast-developers mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-developers

Reply via email to