Michel Fortin wrote:
On 2009-02-19 00:35:20 -0500, Andrei Alexandrescu <[email protected]> said:

auto s = sub("abracazoo", regex("a([b-e])", "g"), "A$1");

I don't like `sub`, I mean the name. Makes me think of substring more than substitute. My choice would be to reuse what we have in std.string and augment it to work with regular expressions:

    auto s = replace("abracazoo", regex("a([b-e])", "g"), subex("A$1"));

Ok. Probably subex is a bit of a killer, but I see your point (subex is not an arbitrary string).

This way it works consistently whether you're using a string or a regular expression: just replace any pattern string with regex(...) and any replacement string with subex(...) -- "substition-expression" -- when you want them to be parsed as such. Omitting subex in the above would make it a plain string replacement for instance (this way it's easy to place use a variable there).

Indeed, that was part of the impetus for making regex a distinct type that participates in larger functions. The only problem is that regex does not work with std.algorithm in an obvious way, e.g. find() works very differently for strings and regexes. I considered at a point trying to integrate them, but decided to not spend that effort right now.

These functions should allow easy substitution of any string or regex pattern with another algorithm for matching the pattern.

And there's not way to get a range of matches using std.string, but there should be, and it should follow the same rule as above: supporting strings and regex consistently. (Using the `in` operator as suggested by Bill Baxter seems a good fit for this function.)

I defined the following in std.algorithm (signatures simplified):

// Split a range by a 1-element separator
Splitter!(...) splitter(Range, Element)(Range input, Range separator);
// Split a range by a subrange separator
Splitter!(...) splitter(Range)(Range input, Range separator);

I then defined this in std.regex:

// Split a range by a subrange separator
Splitter!(...) splitter(Range)(Range input, Regex separator);

Now this is very nice because you get to switch from one to another very easily.

foreach (e; splitter(input, ',')) { ... }
foreach (e; splitter(input, ", ")) { ... }
foreach (e; splitter(input, regex(", *"))) { ... }

The speed/flexibility tradeoff is self-evident and under the control of the programmer without much fuss as it's very easy to switch from one form to another.

And if any of you complains about the extra verbosity, here's what I suggest:

    auto s = replace("abracazoo", re"a([b-e])"g, se"A$1");

Yes, syntaxic sugar for declaring regular expressions.


Two other syntactic options are available:

"abracazoo".match(regex("a[b-e]", "g")))
"abracazoo".match("a[b-e]", "g")

I despise the second one, because if you omit regex(...) it makes me think you're checking for string matches, not expression matches. There's nothing in the name of the funciton telling you you're dealing with a regular expression, so it could easily get confusing.

This is yet another proof that discussion of syntax, notation, and naming will never go out of fashion. I was half convinced by the others that we're in good shape with input.match(regex).


Andrei

Reply via email to