Re: Is str ~ regex the root of all evil, or the leaf of all good?

Andrei Alexandrescu Thu, 19 Feb 2009 06:50:20 -0800

Michel Fortin wrote:

On 2009-02-19 00:35:20 -0500, Andrei Alexandrescu<[email protected]> said:
auto s = sub("abracazoo", regex("a([b-e])", "g"), "A$1");
I don't like `sub`, I mean the name. Makes me think of substring morethan substitute. My choice would be to reuse what we have in std.stringand augment it to work with regular expressions:
    auto s = replace("abracazoo", regex("a([b-e])", "g"), subex("A$1"));

Ok. Probably subex is a bit of a killer, but I see your point (subex isnot an arbitrary string).

This way it works consistently whether you're using a string or aregular expression: just replace any pattern string with regex(...) andany replacement string with subex(...) -- "substition-expression" --when you want them to be parsed as such. Omitting subex in the abovewould make it a plain string replacement for instance (this way it'seasy to place use a variable there).

Indeed, that was part of the impetus for making regex a distinct typethat participates in larger functions. The only problem is that regexdoes not work with std.algorithm in an obvious way, e.g. find() worksvery differently for strings and regexes. I considered at a point tryingto integrate them, but decided to not spend that effort right now.

These functions should allow easy substitution of any string or regexpattern with another algorithm for matching the pattern.
And there's not way to get a range of matches using std.string, butthere should be, and it should follow the same rule as above: supportingstrings and regex consistently. (Using the `in` operator as suggested byBill Baxter seems a good fit for this function.)


I defined the following in std.algorithm (signatures simplified):

// Split a range by a 1-element separator
Splitter!(...) splitter(Range, Element)(Range input, Range separator);
// Split a range by a subrange separator
Splitter!(...) splitter(Range)(Range input, Range separator);

I then defined this in std.regex:

// Split a range by a subrange separator
Splitter!(...) splitter(Range)(Range input, Regex separator);

Now this is very nice because you get to switch from one to another veryeasily.


foreach (e; splitter(input, ',')) { ... }
foreach (e; splitter(input, ", ")) { ... }
foreach (e; splitter(input, regex(", *"))) { ... }

The speed/flexibility tradeoff is self-evident and under the control ofthe programmer without much fuss as it's very easy to switch from oneform to another.

And if any of you complains about the extra verbosity, here's what Isuggest:
    auto s = replace("abracazoo", re"a([b-e])"g, se"A$1");

Yes, syntaxic sugar for declaring regular expressions.
Two other syntactic options are available:

"abracazoo".match(regex("a[b-e]", "g")))
"abracazoo".match("a[b-e]", "g")
I despise the second one, because if you omit regex(...) it makes methink you're checking for string matches, not expression matches.There's nothing in the name of the funciton telling you you're dealingwith a regular expression, so it could easily get confusing.

This is yet another proof that discussion of syntax, notation, andnaming will never go out of fashion. I was half convinced by the othersthat we're in good shape with input.match(regex).



Andrei

Re: Is str ~ regex the root of all evil, or the leaf of all good?

Reply via email to