Summary: Improve performance of std.regex
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos

--- Comment #0 from Jesse Phillips <> 2012-02-09 
09:27:58 PST ---
The previous implementation is said to do some caching of the last used engine.
english.dic is 134,950 entries for these timings.

Test code
import std.file;
import std.string;
import std.datetime;
import std.regex;

private int[string] model;

void main() {
   auto name = "english.dic";
   foreach(w; std.file.readText(name).toLower.splitLines)
      model[w] += 1;

   foreach(w; std.string.split(readText(name)))
      if(!match(w, regex(r"\d")).empty)
      else if(!match(w, regex(r"\W")).empty)

I'm trying to avoid the caching here, but still see better performance in
2.056. Actually I find these timings are with mingw on Windows. I find it odd
that user time is actually fast, but real time is the slow piece, does mingw
have access to the proper information?

$ time ./test2.056.exe

real    0m0.860s
user    0m0.047s
sys     0m0.000s

$ time ./test2.058.exe

real    0m55.500s
user    0m0.031s
sys     0m0.000s

