in a private note Dan asked if ksh had ~(...) magic for doing the rightmost
match and I correctly said NO
it turns out ast regex and ksh extended patterns can do rightmost match
with negation rather than magic flags/options
it just took a night's sleep to figure out a neat use for negation

this pattern will return the token of the rightmost match in ${.sh.match[5]}

~(-glr)!(__+(+([A-Z0-9])?(_))__)@(__+(+([A-Z0-9])?(_))__)

-g non-greedy
-l no left anchor
-r no right anchor




On Wed, Oct 30, 2013 at 10:01 AM, <[email protected]> wrote:

> Glenn,
>
> Thank you very much.  Your suggestion that I use ~(-g) did the trick.
>
> Not only is the code is blazing fast, but, since I needed the left-most
> match, it (serendipitously) fixed the problem that my previous code was
> finding the right-most match.  Here are the before and after versions.
>
> x=YY:__AB_CD_EF____12_34_56____PQ_RS_TU__:ZZ
>
> [[ $x == @(*__+(+([A-Z0-9])?(_))__*) ]]
> print “${.sh.match[2]}"
> PQ_RS_TU
>
> [[ $x == ~(-g:@(*__+(+([A-Z0-9])?(_))__*)) ]]
> print “${.sh.match[2]}"
> AB_CD_EF
>
> **
> BTW — This is used in some code that marches along such lines, from left
> to right, finding and replacing tokens bounded by double underscores.  Each
> token could be one or more numbers and/or uppercase letters and/or
> underscores.  Where the token's embedded underscores must not be adjacent
> to another underscore, or appear at the token's ends. I also wanted to
> always be able to recover the match from the same index-numbered element of
> {.sh.match}, in this case index 2.
>
> Thanks,
> Dan
>
> ------------------------------
> *From: *"Glenn Fowler" <[email protected]>
> *To: *"Dan Rickhoff" <[email protected]>
> *Cc: *[email protected]
> *Sent: *Tuesday, October 29, 2013 7:24:26 AM
> *Subject: *Re: [ast-users] If this is a bug, what ksh version must I
> upgrade to?
>
>
> its a performance problem with the underlying regex
> whenever (...) groups are involved it has to work harder
> if you only care about *any* match vs the longest of the leftmost matches
> then prefix the pattern with ~(-g)
> which means "not greedy" or "minimal"
> this loop shows the time deterioration
>
> x=
> for ((i = 1; i <= 20; i++))
> do      x=x$x
>         time -f %E $SHELL -c "[[ x__${x}__x == *@(__+(+(x)?(_))__)* ]];
> printf '%d %2d ' $? $i"
> done
>
>
>
>
> On Tue, Oct 29, 2013 at 8:40 AM, Dan Rickhoff <[email protected]>wrote:
>
>>
>> If this is a ksh bug, what ksh version should I upgrade to?
>>
>> On:
>>      OS: Red Hat Enterprise Linux Server release 6.1 (Santiago)
>>      ksh: version sh (AT&T Research) 93t+ 2010-06-21
>>
>> Elapsed time less than 2 tenths of a second:
>>
>> $ time -f ‘%E\n' ksh -e '[[ A__BBBBBBBB_CCCCC_Z_EEEE__F ==
>> *@(__+(+([A-Z0-9])?(_))__)* ]]'
>> 0:00.14
>>
>> However, if that string is extended by adding, say, seven more "Z"s, then
>> the elapsed mushrooms to almost 10 seconds.
>>
>> $ time -f '%E\n' ksh -e '[[ A__BBBBBBBB_CCCCC_ZZZZZZZZ_EEEE__F ==
>> *@(__+(+([A-Z0-9])?(_))__)* ]]'
>> 0:09.96
>>
>> This appears to be a ksh bug (a memory leak?), what ksh version must I
>> upgrade to to get past it?
>>
>> Please let me know if I should provide further information.
>>
>> Thanks,
>> Dan
>>
>> _______________________________________________
>> ast-users mailing list
>> [email protected]
>> http://lists.research.att.com/mailman/listinfo/ast-users
>>
>>
>
>
_______________________________________________
ast-users mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-users

Reply via email to