On Sun, Oct 27, 2013 at 3:38 AM, Peter Bex <peter....@xs4all.nl> wrote:
> On Sat, Oct 26, 2013 at 10:37:36AM -0700, Matt Welland wrote: > > This regex is so slow that you don't need a timer to see the impact (at > > least not on my machine with chicken 4.8.0): > > > > (string-match "[a-z][a-z0-9\\-_.]{0,20}" > "a012345678901234567890123456789") > > > > Changing the {0,20} to + makes it run normally fast so I just replaced > the > > regex with a string-length and modified the "{0,20}" to "+" . I don't > > necessarily need a fix for this but it seems like a possible symptom of a > > deeper problem so I thought I'd report it. > > Hi Matt, > > Thanks for your report. I'm afraid this is a known problem with > Irregex - to avoid producing a state machine with too many states, > it will always use a backtracking implementation for all repetition > counts. > > I think it's best to take a look at how to fix this upstream first. > Maybe Alex has an idea of how to do that. > It's possible to expand repetition counts in the DFA. Your example basically becomes: "[a-z][a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?" which in this case probably has a reasonably sized DFA. In other cases, these patterns can easily blow up and force the backtracking fallback. I'm not sure if it's better to try more aggressively or to keep the easier rule of thumb that n..m repetitions always force backtracking. > Pre-compiling the regex didn't seem to make any difference. > > That's because the backtracker matches really slowly. > There is possibly some pathological backtracking happening here, I'll take a look. The long-term goal is to replace the backtracking with a more scalable approach (e.g. the non-backtracking NFA used in the SRFI 115 reference implementation) and only use DFAs for simple patterns or when speed is explicitly requested. Doing this and preserving all PCRE features will take time though (especially backrefs). -- Alex
_______________________________________________ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users