Glenn, I think a detail of the issue is, that all modern Linux
distributions use the UTF-8 multibyte encoding, by default.

Olga

On Tue, Oct 29, 2013 at 6:24 PM, Glenn Fowler <[email protected]> wrote:
> my guess is its an n^2 or worse problem that would eventually cause problems
> even with *s++
> besides for LC_ALL=C ast defaults back to *s++ any way modulo one ?: test
> for each *s++
>
>
> On Tue, Oct 29, 2013 at 1:09 PM, ольга крыжановская
> <[email protected]> wrote:
>>
>> Glenn, a possible optimization is to run the regex patterns on a
>> wchar_t string and not a byte string. It would eliminate all the mb*()
>> calls which are often called during backtracking, and represent a
>> major hit at run time.
>>
>> Olga
>>
>> On Tue, Oct 29, 2013 at 3:24 PM, Glenn Fowler <[email protected]>
>> wrote:
>> > its a performance problem with the underlying regex
>> > whenever (...) groups are involved it has to work harder
>> > if you only care about *any* match vs the longest of the leftmost
>> > matches
>> > then prefix the pattern with ~(-g)
>> > which means "not greedy" or "minimal"
>> > this loop shows the time deterioration
>> >
>> > x=
>> > for ((i = 1; i <= 20; i++))
>> > do      x=x$x
>> >         time -f %E $SHELL -c "[[ x__${x}__x == *@(__+(+(x)?(_))__)* ]];
>> > printf '%d %2d ' $? $i"
>> > done
>> >
>> >
>> >
>> >
>> > On Tue, Oct 29, 2013 at 8:40 AM, Dan Rickhoff <[email protected]>
>> > wrote:
>> >>
>> >>
>> >> If this is a ksh bug, what ksh version should I upgrade to?
>> >>
>> >> On:
>> >>      OS: Red Hat Enterprise Linux Server release 6.1 (Santiago)
>> >>      ksh: version sh (AT&T Research) 93t+ 2010-06-21
>> >>
>> >> Elapsed time less than 2 tenths of a second:
>> >>
>> >> $ time -f ‘%E\n' ksh -e '[[ A__BBBBBBBB_CCCCC_Z_EEEE__F ==
>> >> *@(__+(+([A-Z0-9])?(_))__)* ]]'
>> >> 0:00.14
>> >>
>> >> However, if that string is extended by adding, say, seven more "Z"s,
>> >> then
>> >> the elapsed mushrooms to almost 10 seconds.
>> >>
>> >> $ time -f '%E\n' ksh -e '[[ A__BBBBBBBB_CCCCC_ZZZZZZZZ_EEEE__F ==
>> >> *@(__+(+([A-Z0-9])?(_))__)* ]]'
>> >> 0:09.96
>> >>
>> >> This appears to be a ksh bug (a memory leak?), what ksh version must I
>> >> upgrade to to get past it?
>> >>
>> >> Please let me know if I should provide further information.
>> >>
>> >> Thanks,
>> >> Dan
>> >>
>> >> _______________________________________________
>> >> ast-users mailing list
>> >> [email protected]
>> >> http://lists.research.att.com/mailman/listinfo/ast-users
>> >>
>> >
>> >
>> > _______________________________________________
>> > ast-users mailing list
>> > [email protected]
>> > http://lists.research.att.com/mailman/listinfo/ast-users
>> >
>>
>>
>>
>> --
>>       ,   _                                    _   ,
>>      { \/`o;====-    Olga Kryzhanovska   -====;o`\/ }
>> .----'-/`-/     [email protected]   \-`\-'----.
>>  `'-..-| /       http://twitter.com/fleyta     \ |-..-'`
>>       /\/\     Solaris/BSD//C/C++ programmer   /\/\
>>       `--`                                      `--`
>
>



-- 
      ,   _                                    _   ,
     { \/`o;====-    Olga Kryzhanovska   -====;o`\/ }
.----'-/`-/     [email protected]   \-`\-'----.
 `'-..-| /       http://twitter.com/fleyta     \ |-..-'`
      /\/\     Solaris/BSD//C/C++ programmer   /\/\
      `--`                                      `--`
_______________________________________________
ast-users mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-users

Reply via email to