Re: Is str ~ regex the root of all evil, or the leaf of all good?

Andrei Alexandrescu Thu, 19 Feb 2009 07:05:19 -0800

bearophile wrote:

Andrei Alexandrescu:

but most regex code I've seen mentions the string first and the regex second. So I 
dropped that idea.<


I like the following syntaxes (the one with .match() too):

import std.re: regex;

foreach (e; regex("a[b-e]", "g") in "abracazoo")
     writeln(e);

foreach (e; regex("a[b-e]", "g").match("abracazoo"))
     writeln(e);

auto re1 = regex("a[b-e]", "g");
foreach (e; re1.match("abracazoo"))
     writeln(e);

auto re1 = regex("a[b-e]", "g");
foreach (e; re1 in "abracazoo")
     writeln(e);

These all put the regex before the string, something many people wouldfind unsavory.

----------------

I like the support of verbose regular expressions too, that ignore whitespace 
and comments (for example with //...) inserted into the regex itself. This 
simple thing is able to turn the messy world of regexes into programming again.

This is an example of usual RE in Python:

finder = re.compile("^\s*([\[\]])\s*([-+]?\d+)\s*,\s*([-+]?\d+)\s*([\[\]])\s*$")


This is the same RE in verbose mode, in Python still (# is the Python 
single-line comment syntax):

finder = re.compile(r"""
    ^ \s*             # start at beginning+ opt spaces
    ( [\[\]] )        # Group 1: opening bracket
        \s*           # optional spaces
        ( [-+]? \d+ ) # Group 2: first number
        \s* , \s*     # opt spaces+ comma+ opt spaces
        ( [-+]? \d+ ) # Group 3: second number
        \s*           # opt spaces
    ( [\[\]] )        # Group 4: closing bracket
    \s* $             # opt spaces+ end at the end
    """, flags=re.VERBOSE)

As you can see it's often very positive to indent logically those lines just 
like code.

Yah, I saw that ECMA introduced comments in regexes too. At some pointwe'll implement that.

----------------

As the other people here, I don't like the following much, it's a misleading 
overload of the ~ operator:

"abracazoo" ~ regex("a[b-e]", "g")

----------------

I don't like that "g" argument much, my suggestions:

RE attributes:
"repeat", "r": Repeat over the whole input string
"ignorecase", "i": case insensitive
"multiline", "m": treat as multiple lines separated by newlines
"verbose", "v": ignores space outside [] and allows comments

And how do you combine them? "repeat, ignorecase"? Writing and parsingsuch options becomes a little adventure in itself. I think the "g", "i",and "m" flags are popular enough if you've done any amount of regexprogramming. If not, you'll look up the manual regardless.

If not already so, I'd like sub() to take as replacement a string or a callable.


It does, I haven't mentioned it yet. Pass-by-alias of course :o).


Andrei

Re: Is str ~ regex the root of all evil, or the leaf of all good?

Reply via email to