Glenn,
Thank you very much. Your suggestion that I use ~(-g) did the trick.
Not only is the code is blazing fast, but, since I needed the left-most match,
it (serendipitously) fixed the problem that my previous code was finding the
right-most match. Here are the before and after versions.
x=YY:__AB_CD_EF____12_34_56____PQ_RS_TU__:ZZ
[[ $x == @(*__+(+([A-Z0-9])?(_))__*) ]]
print “${.sh.match[2]}"
PQ_RS_TU
[[ $x == ~(-g:@(*__+(+([A-Z0-9])?(_))__*)) ]]
print “${.sh.match[2]}"
AB_CD_EF
BTW — This is used in some code that marches along such lines, from left to
right, finding and replacing tokens bounded by double underscores. Each token
could be one or more numbers and/or uppercase letters and/or underscores. Where
the token's embedded underscores must not be adjacent to another underscore, or
appear at the token's ends. I also wanted to always be able to recover the
match from the same index-numbered element of {.sh.match}, in this case index
2.
Thanks,
Dan
----- Original Message -----
From: "Glenn Fowler" <[email protected]>
To: "Dan Rickhoff" <[email protected]>
Cc: [email protected]
Sent: Tuesday, October 29, 2013 7:24:26 AM
Subject: Re: [ast-users] If this is a bug, what ksh version must I upgrade to?
its a performance problem with the underlying regex
whenever (...) groups are involved it has to work harder
if you only care about *any* match vs the longest of the leftmost matches then
prefix the pattern with ~(-g)
which means "not greedy" or "minimal"
this loop shows the time deterioration
x=
for ((i = 1; i <= 20; i++))
do x=x$x
time -f %E $SHELL -c "[[ x__${x}__x == *@(__+(+(x)?(_))__)* ]]; printf '%d %2d
' $? $i"
done
<blockquote>
</blockquote>
<blockquote>
</blockquote>
<blockquote>
</blockquote>
<blockquote>
</blockquote>
On Tue, Oct 29, 2013 at 8:40 AM, Dan Rickhoff < [email protected] >
wrote:
<blockquote>
If this is a ksh bug, what ksh version should I upgrade to?
On:
OS: Red Hat Enterprise Linux Server release 6.1 (Santiago)
ksh: version sh (AT&T Research) 93t+ 2010-06-21
Elapsed time less than 2 tenths of a second:
$ time -f ‘%E\n' ksh -e '[[ A__BBBBBBBB_CCCCC_Z_EEEE__F ==
*@(__+(+([A-Z0-9])?(_))__)* ]]'
0:00.14
However, if that string is extended by adding, say, seven more "Z"s, then the
elapsed mushrooms to almost 10 seconds.
$ time -f '%E\n' ksh -e '[[ A__BBBBBBBB_CCCCC_ZZZZZZZZ_EEEE__F ==
*@(__+(+([A-Z0-9])?(_))__)* ]]'
0:09.96
This appears to be a ksh bug (a memory leak?), what ksh version must I upgrade
to to get past it?
Please let me know if I should provide further information.
Thanks,
Dan
_______________________________________________
ast-users mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-users
</blockquote>
_______________________________________________
ast-users mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-users