*Spoiler*: let's slowly deprecate "g" option in std.regex in a few years or with any luck a bit faster. The better replacement is proposed.

For better or worse the current API has retained a (high) level of compatibility with the old API. That means I've missed the chance to fix it when I could, and here is the prime problem (the hardest) I have with it:

foreach(m; match("bleh-blah", "bl[ea]h"))
{
        writeln(m.hit);
}

The "quiz" is - how many lines will this print?

The current answer is 1. And that the right solution for all matches is:

foreach(m; match("bleh-blah", regex("bl[ea]h","g"))
{
        writeln(m.hit);
}

Which is not only looks unsightly but also confuses operation option (find _all_ vs find _first_) with property of a pattern (like case-insensitivity is). And if regex pattern is defined elsewhere it could easily introduce a bug (albeit one that's easy to track, "usually").

To underline the point: std.regex.splitter doesn't take "g" flag into account at all (it makes no sense there).

I've pondered a couple of solutions in a bug report by bearophile:
http://d.puremagic.com/issues/show_bug.cgi?id=7260

After all of these ideas born and discarded, here is what I believe is the way forward out of this mess:

Make "g" indicates only the intended _default_ search mode of this pattern (global - first match).

User is free to override this default explicitly and in fact encouraged to do so. The idea of default search mode attached to the regex pattern is marked as discouraged.

The overrides have to be convenient and backwards compatible.
Thus I propose the follwing:

match and replace become structs (types, oh my!) with the following "interface":

struct match //ditto  for replace
{
        //current behavior
        static auto opCall(.....);
        //get the first match / replace only first occurance
        static auto first();
        // force to find all matches (still lazy range) and
        static auto all();
}

OT: C++ folks call this namespace, but they don't have static opCall - suckers ;) And I actually proposed (twice) to kill static opCall, sweet irony.

Then the motivating example would be :

foreach(m; match.all("bleh-blah", "bl[ea]h"))
{
        writeln(m.hit);
}

and :

//prints all submatches of the first match:
foreach(m; match.first("bleh-blah", "bl[ea]h"))
{
        // don't compile, m - is the first match itself no .hit there
        // that should make it harder to confuse
        // "first match" with "all matches"
        //writeln(m.hit);
        writeln(m);
}

We can go further and introduce the enhancement I long dreamed of:

//'any' or 'test' are also the names to choose from
if(match.anywhere(string, "[0-9]+"))
{
        //there is at least 1 match (no need for other info)
        ...
}

The reason I want this "shorthand" is that regex engine can cut a bunch of corners and serve up this "is there a match somewhere?" request much, MUCH faster then "where is the first match and all of its submatches?". And many use cases only need this yes/no thing anyway.

... that got a bit lengthy - any thoughts, criticism, opinions ?

--
Dmitry Olshansky

Reply via email to