Michel Fortin wrote:
On 2009-02-19 00:35:20 -0500, Andrei Alexandrescu
<[email protected]> said:
auto s = sub("abracazoo", regex("a([b-e])", "g"), "A$1");
I don't like `sub`, I mean the name. Makes me think of substring more
than substitute. My choice would be to reuse what we have in std.string
and augment it to work with regular expressions:
auto s = replace("abracazoo", regex("a([b-e])", "g"), subex("A$1"));
Ok. Probably subex is a bit of a killer, but I see your point (subex is
not an arbitrary string).
This way it works consistently whether you're using a string or a
regular expression: just replace any pattern string with regex(...) and
any replacement string with subex(...) -- "substition-expression" --
when you want them to be parsed as such. Omitting subex in the above
would make it a plain string replacement for instance (this way it's
easy to place use a variable there).
Indeed, that was part of the impetus for making regex a distinct type
that participates in larger functions. The only problem is that regex
does not work with std.algorithm in an obvious way, e.g. find() works
very differently for strings and regexes. I considered at a point trying
to integrate them, but decided to not spend that effort right now.
These functions should allow easy substitution of any string or regex
pattern with another algorithm for matching the pattern.
And there's not way to get a range of matches using std.string, but
there should be, and it should follow the same rule as above: supporting
strings and regex consistently. (Using the `in` operator as suggested by
Bill Baxter seems a good fit for this function.)
I defined the following in std.algorithm (signatures simplified):
// Split a range by a 1-element separator
Splitter!(...) splitter(Range, Element)(Range input, Range separator);
// Split a range by a subrange separator
Splitter!(...) splitter(Range)(Range input, Range separator);
I then defined this in std.regex:
// Split a range by a subrange separator
Splitter!(...) splitter(Range)(Range input, Regex separator);
Now this is very nice because you get to switch from one to another very
easily.
foreach (e; splitter(input, ',')) { ... }
foreach (e; splitter(input, ", ")) { ... }
foreach (e; splitter(input, regex(", *"))) { ... }
The speed/flexibility tradeoff is self-evident and under the control of
the programmer without much fuss as it's very easy to switch from one
form to another.
And if any of you complains about the extra verbosity, here's what I
suggest:
auto s = replace("abracazoo", re"a([b-e])"g, se"A$1");
Yes, syntaxic sugar for declaring regular expressions.
Two other syntactic options are available:
"abracazoo".match(regex("a[b-e]", "g")))
"abracazoo".match("a[b-e]", "g")
I despise the second one, because if you omit regex(...) it makes me
think you're checking for string matches, not expression matches.
There's nothing in the name of the funciton telling you you're dealing
with a regular expression, so it could easily get confusing.
This is yet another proof that discussion of syntax, notation, and
naming will never go out of fashion. I was half convinced by the others
that we're in good shape with input.match(regex).
Andrei