bearophile wrote:
Andrei Alexandrescu:
but most regex code I've seen mentions the string first and the regex second. So I
dropped that idea.<
I like the following syntaxes (the one with .match() too):
import std.re: regex;
foreach (e; regex("a[b-e]", "g") in "abracazoo")
writeln(e);
foreach (e; regex("a[b-e]", "g").match("abracazoo"))
writeln(e);
auto re1 = regex("a[b-e]", "g");
foreach (e; re1.match("abracazoo"))
writeln(e);
auto re1 = regex("a[b-e]", "g");
foreach (e; re1 in "abracazoo")
writeln(e);
These all put the regex before the string, something many people would
find unsavory.
----------------
I like the support of verbose regular expressions too, that ignore whitespace
and comments (for example with //...) inserted into the regex itself. This
simple thing is able to turn the messy world of regexes into programming again.
This is an example of usual RE in Python:
finder = re.compile("^\s*([\[\]])\s*([-+]?\d+)\s*,\s*([-+]?\d+)\s*([\[\]])\s*$")
This is the same RE in verbose mode, in Python still (# is the Python
single-line comment syntax):
finder = re.compile(r"""
^ \s* # start at beginning+ opt spaces
( [\[\]] ) # Group 1: opening bracket
\s* # optional spaces
( [-+]? \d+ ) # Group 2: first number
\s* , \s* # opt spaces+ comma+ opt spaces
( [-+]? \d+ ) # Group 3: second number
\s* # opt spaces
( [\[\]] ) # Group 4: closing bracket
\s* $ # opt spaces+ end at the end
""", flags=re.VERBOSE)
As you can see it's often very positive to indent logically those lines just
like code.
Yah, I saw that ECMA introduced comments in regexes too. At some point
we'll implement that.
----------------
As the other people here, I don't like the following much, it's a misleading
overload of the ~ operator:
"abracazoo" ~ regex("a[b-e]", "g")
----------------
I don't like that "g" argument much, my suggestions:
RE attributes:
"repeat", "r": Repeat over the whole input string
"ignorecase", "i": case insensitive
"multiline", "m": treat as multiple lines separated by newlines
"verbose", "v": ignores space outside [] and allows comments
And how do you combine them? "repeat, ignorecase"? Writing and parsing
such options becomes a little adventure in itself. I think the "g", "i",
and "m" flags are popular enough if you've done any amount of regex
programming. If not, you'll look up the manual regardless.
If not already so, I'd like sub() to take as replacement a string or a callable.
It does, I haven't mentioned it yet. Pass-by-alias of course :o).
Andrei