Summary: Add(?:) Non-capturing parentheses group support to
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos

--- Comment #0 from Dmitry Olshansky <> 2010-11-05 
09:35:15 PDT ---
Intro: Non-capturing parentheses group the regex so you can apply regex
operators, but do not capture anything and do not create backreferences. 

//A very dumb example, matches abcabcabc, no backrefs created
//A decent attempt to snatch href field of <a> html tag, without unnessary
<(?:a|A)(?:[^<>]*)href *= *"?([^"<> ]*)"?(?:[^<>]*)> 

Rationale: ECMA262 standart mentioned on
requires support of such construct. Sooner or later we should get rid of 
"however, some of the very advanced forms may behave slightly differently",
also given the fact that sometimes it's simple. See attached patch.

Backtracking is also costly, see benchmark code/results
 (uses the proposed patch):
import std.regex, std.stdio,std.datetime;

void main(){
    auto r1 = regex(`(?:a|A)(?:[^<>]*)href *= *"?([^"<> ]*)"?(?:[^<>]*)>`,"g"); 
    auto r2 = regex(`(a|A)([^<>]*)href *= *"?([^"<> ]*)"?([^<>]*)>`,"g"); 
    void nobackref(){
        match(`<a href =  id="G"/>`,r1).hit;
    void backref(){
        match(`<a href =  id="G"/>`,r2).hit;
    auto bench = benchmark!(nobackref,backref)(1_000);
    writeln("No backref:   ",bench[0].milliseconds);
    writeln("With backref: ",bench[1].milliseconds);
Results on my machine, min .. max of 10
No backref:   256.955 .. 267.341
With backref: 580.636 .. 587.187

P.S. I have rebuilt phobos (on Windows), and run unitestes, output:
 --- std.socket(660) broken test ---
 (std.socket.HostException: Address family mismatch)
args.length = 1
args[0] = 'C:\dmd2\src\phobos\unittest.exe~T'
Vendor string:    AuthenticAMD
Processor string: AMD Phenom(tm) II X4 940 Processor
Signature:        Family=16 Model=4 Stepping=2
Features:         MMX FXSR SSE SSE2 SSE3 3DNow! 3DNow!+ MMX+ AMD64 HTT
Multithreading:   4 threads / 4 cores


Configure issuemail:
------- You are receiving this mail because: -------

Reply via email to