Re: Is str ~ regex the root of all evil, or the leaf of all good?

Andrei Alexandrescu Thu, 19 Feb 2009 07:55:16 -0800

Denis Koroskin wrote:

On Thu, 19 Feb 2009 18:01:56 +0300, Andrei Alexandrescu<[email protected]> wrote:
bearophile wrote:
Andrei Alexandrescu:
but most regex code I've seen mentions the string first and theregex second. So I dropped that idea.<
 I like the following syntaxes (the one with .match() too):
 import std.re: regex;
 foreach (e; regex("a[b-e]", "g") in "abracazoo")
     writeln(e);
 foreach (e; regex("a[b-e]", "g").match("abracazoo"))
     writeln(e);
 auto re1 = regex("a[b-e]", "g");
foreach (e; re1.match("abracazoo"))
     writeln(e);
 auto re1 = regex("a[b-e]", "g");
foreach (e; re1 in "abracazoo")
     writeln(e);
These all put the regex before the string, something many people wouldfind unsavory.
----------------
I like the support of verbose regular expressions too, that ignorewhitespace and comments (for example with //...) inserted into theregex itself. This simple thing is able to turn the messy world ofregexes into programming again.
 This is an example of usual RE in Python:
finder =re.compile("^\s*([\[\]])\s*([-+]?\d+)\s*,\s*([-+]?\d+)\s*([\[\]])\s*$")This is the same RE in verbose mode, in Python still (# is thePython single-line comment syntax):
 finder = re.compile(r"""
    ^ \s*             # start at beginning+ opt spaces
    ( [\[\]] )        # Group 1: opening bracket
        \s*           # optional spaces
        ( [-+]? \d+ ) # Group 2: first number
        \s* , \s*     # opt spaces+ comma+ opt spaces
        ( [-+]? \d+ ) # Group 3: second number
        \s*           # opt spaces
    ( [\[\]] )        # Group 4: closing bracket
    \s* $             # opt spaces+ end at the end
    """, flags=re.VERBOSE)
As you can see it's often very positive to indent logically thoselines just like code.
Yah, I saw that ECMA introduced comments in regexes too. At some pointwe'll implement that.
----------------
As the other people here, I don't like the following much, it's amisleading overload of the ~ operator:
 "abracazoo" ~ regex("a[b-e]", "g")
 ----------------
 I don't like that "g" argument much, my suggestions:
 RE attributes:
"repeat", "r": Repeat over the whole input string
"ignorecase", "i": case insensitive
"multiline", "m": treat as multiple lines separated by newlines
"verbose", "v": ignores space outside [] and allows comments
And how do you combine them? "repeat, ignorecase"? Writing and parsingsuch options becomes a little adventure in itself. I think the "g","i", and "m" flags are popular enough if you've done any amount ofregex programming. If not, you'll look up the manual regardless.
Perhaps, string.match("a[b-e]", Regex.Repeat | Regex.IgnoreCase); mightbe better? I don't find "gmi" immediately clear nor self-documenting.

I got disabused a very long time ago of the notion that everything aboutregexes is clear or self-documenting. Really. You just get to a level ofunderstanding that's appropriate for your needs. On that scale, gettingused to "gmi" is so low, it's not even worth discussing.



Andrei

Re: Is str ~ regex the root of all evil, or the leaf of all good?

Reply via email to