Brent Dax writes: : # ?pat? /<?f:pat/ ??? : # /pat/i m:i/pat/ or /<?i:pat>/ or even m<?i:pat> ??? : : Whoa, those are moving to the front?!?
The problem with options in general is that they can't easily modify parsing if they come in back. Now in the particular case of /f and /i, it probably doesn't matter. But I was trying to see if there was some way to do away with trailing options altogether. This might even extend to things like: qq:s"$interpolates @doesn't %doesn't" And that's definitely a situation where it changes the parse. Hmm, if strings have options, they're probably addititive, so to add scalar interpolation you'd want to base it on "q", not "qq": q:s"$interpolates @doesn't %doesn't" On the other hand, that doesn't work for the other things like "qr", so maybe any of :s, :a, :h turn off default interpolations, so qr:a would only interpolate arrays, for instance. : # /pat/x /pat/ : # /^pat$/m /^^pat$$/ : : That's...odd. Is $$ (the variable) going away? Maybe. It'd be $*PID if so, since it's truly global to the process. But if not, we could special case $$ inside regexes, just as we already special case $ itself. : # \p{prop} <+prop> ??? : # \P{prop} <-prop> ??? : : Intriguing. Yeah, especially when you start stacking them. But maybe we're treading on [...] territory. It could be argued that <...> is just a generalized form of POSIX's [:...:] construct : # \t also <tab> : # \n also <lf> or <nl> (latter matching : logical newline) : # \r also <cr> : # \f also <ff> : # \a also <bell> : # \e also <esc> : : I can tell you right now that these are going to screw people up. : They'll try to use these in normal strings and be confused when it : doesn't work. And you probably won't be able to emit a warning, : considering how much CGI Perl munches. I can see pragmatic variants in which those *do* interpolate by default. And pragmatic variants where they don't. : # \033 same : # \x1B same : # \x{263a} \x<263a> ??? : : Why? Wouldn't we want the same thing to work in quoted strings? (Or : are those changing syntaxes too?) I'm just wondering how far I can drive the principle that {} is always a closure (even though it isn't). I admit that it's probably overkill here, which is why there are question marks. : # \c[ same : # \N{name} <name> : # \l same : # \u same : # \Lstring\E \L<string> : # \Ustring\E \U<string> : : So that's changed from whenever you talked about \q{} ? Possibly. Again, the question is whether {} more strongly imply something that's not true. But curlies were so overloaded in Perl 5 that I don't think people are going to necessarily expect them to do only one thing. Still, if <> are taking over the role of "unmarked metasyntactic delimiters", maybe they belong here too. : # \E gone : # [\040\t] \h plus any Unicode horizontal whitespace : # [\r\n\ck] \v plus any Unicode vertical whitespace : #=20 : # \b same : # \B same : : # \A ^ : # \Z same? : # \z $ : : Are you sure that optimizes for the common case? No, I'm not sure, but we have to clean up the \A...\z mess somehow. : # \G <pos>, but assumed in nested patterns? : # =20 : # \1 $1 : #=20 : # \Q$var\E $var always assumed literal, so $1 is literal : backref : : So these are reinterpolated every time you backtrack? Are you *trying* : to destroy regex performance? :^) They're not interpolated. They're matched, as in string comparison, just as backrefs are matched right now. : # $var <$var> assumed to be regex : : What if $var is a qr//ed object? Then it's a pretty easy assumption that it's a regex. :-) : # =~ $re =~ /<$re>/ ouch? : : I don't see the win. No difference if $re is qr//, but if it's not, that is the syntax for forcing $re to be interpreted as a regex. : # (??{$rule}) <rule> : # (?{ code }) { code } with failure semantics : # (?#...) {"..."} :-) : # (?:...) <:...> : # (?=3D...) <before: ...> : # (?!...) <!before: ...> : # (?<=3D...) <after: ...> : # (?<!...) <!after: ...> : : Cute. (Wait a minute, aren't those reversed?) Nope, I realized they were ambiguous depending on whether you think of them as declarative or operational, but I settled on the declarative reading because it works with their being assertions. All the other options I could think of are either really clunky or similarly ambiguous. : # (?>...) <grab: ...> : # (?(cond)t|f) Not sure. Could just use { if ... } : : <if(cond):true|false>? Well, sure, if you're attached to that particular set of punctuation. But we could also have <if cond: ...> <elsif cond: ...> <else: ...> On the other hand, I think we'll often see parsers doing things like: $TERM = qr/{ when cond { /.../ } when cond { /.../ } when cond { /.../ } when cond { /.../ } when cond { /.../ } when cond { /.../ } default { /.../ } }/; So maybe the <> version is: <when cond: ...> <when cond: ...> <when cond: ...> <when cond: ...> <when cond: ...> <default: ...> (assuming the scoping of "break" can be worked out). : # Obviously the <word> and <word:...> syntaxes will be user=20 : # extensible. We have to be able to support full grammars. I=20 : # consider it a feature that <foo> looks like a non-terminal in=20 : # standard BNF notation. I do not consider it a misfeature=20 : # that <foo> resembles an HTML or XML tag, since most of those=20 : # languages need to be matched with a fancy rule named <tag> anyway. : : But that *does* make it harder to define the fancy rules. I could see : someone defining rules like: : : 'gt' =3D> qr/\</, : 'lt' =3D> qr/\>/ : : just to get around backslashing everything in sight. I could see someone saying qr:X or some such. : # An interesting idea would be that if you say : #=20 : # m<foo: pat> : #=20 : # or : #=20 : # m{code} : #=20 : # it's as if you said : #=20 : # m/<foo: pat>/ : # =20 : # or : # =20 : # m/{code}/ : : I don't know about that one. I often use {} as delimiters on regexen : because it's a character that doesn't occur in data very often. I think : the gain of two characters isn't as critical as the loss of options. : =20 : Understand, I'm not a regex Luddite. I've been working with yacc and : lex a lot lately, so I have at least a hint of how powerful formal : parsing is--and I love all of these features. However, I think that : syntactically a lot of this is a loss for the average Perl hacker. (Not : me, not you, and not most of the people on this list--the *average* : hacker, like the 3s or 4s on PerlMonks.) : : The *average* Perl hacker doesn't have much use for embedded code in a : regex or BNF-like rules. The *average* Perl hacker just wants to do an : s#<emphasis>(\d{1,3}(\.\d{1,3}){3})</emphasis>#<inet>$1</inet># (an : early example from "Mastering Regular Expressions"). There's a very : good chance that he knows exactly what the input data looks like and : that this will work on it. : : For this simple reason, I highly suggest somehow hijacking curlies : instead, and perhaps making embedded code use two curlies. After all, : regexes are intimidating enough already. :^) With respect to Perl 5, I'm trying to unhijack curlies as much as possible. Larry