bearophile wrote:
Andrei Alexandrescu:

but most regex code I've seen mentions the string first and the regex second. So I 
dropped that idea.<

I like the following syntaxes (the one with .match() too):

import std.re: regex;

foreach (e; regex("a[b-e]", "g") in "abracazoo")
     writeln(e);

foreach (e; regex("a[b-e]", "g").match("abracazoo"))
     writeln(e);

auto re1 = regex("a[b-e]", "g");
foreach (e; re1.match("abracazoo"))
     writeln(e);

auto re1 = regex("a[b-e]", "g");
foreach (e; re1 in "abracazoo")
     writeln(e);

These all put the regex before the string, something many people would find unsavory.

----------------

I like the support of verbose regular expressions too, that ignore whitespace 
and comments (for example with //...) inserted into the regex itself. This 
simple thing is able to turn the messy world of regexes into programming again.

This is an example of usual RE in Python:

finder = re.compile("^\s*([\[\]])\s*([-+]?\d+)\s*,\s*([-+]?\d+)\s*([\[\]])\s*$")


This is the same RE in verbose mode, in Python still (# is the Python 
single-line comment syntax):

finder = re.compile(r"""
    ^ \s*             # start at beginning+ opt spaces
    ( [\[\]] )        # Group 1: opening bracket
        \s*           # optional spaces
        ( [-+]? \d+ ) # Group 2: first number
        \s* , \s*     # opt spaces+ comma+ opt spaces
        ( [-+]? \d+ ) # Group 3: second number
        \s*           # opt spaces
    ( [\[\]] )        # Group 4: closing bracket
    \s* $             # opt spaces+ end at the end
    """, flags=re.VERBOSE)

As you can see it's often very positive to indent logically those lines just 
like code.

Yah, I saw that ECMA introduced comments in regexes too. At some point we'll implement that.

----------------

As the other people here, I don't like the following much, it's a misleading 
overload of the ~ operator:

"abracazoo" ~ regex("a[b-e]", "g")

----------------

I don't like that "g" argument much, my suggestions:

RE attributes:
"repeat", "r": Repeat over the whole input string
"ignorecase", "i": case insensitive
"multiline", "m": treat as multiple lines separated by newlines
"verbose", "v": ignores space outside [] and allows comments

And how do you combine them? "repeat, ignorecase"? Writing and parsing such options becomes a little adventure in itself. I think the "g", "i", and "m" flags are popular enough if you've done any amount of regex programming. If not, you'll look up the manual regardless.

If not already so, I'd like sub() to take as replacement a string or a callable.

It does, I haven't mentioned it yet. Pass-by-alias of course :o).


Andrei

Reply via email to