> 1. Sub-rules and backtracking > > > <name(expr)> # call rule, passing Perl args > > { .name(expr) } # same thing. > > > <name pat> # call rule, passing regex arg > > { .name(/pat/) } # same thing. > > Considering perl can't sanely know how to backtrack into a closure, wouldn't > { .name(expr) } be equal to <name(expr)>: instead? (note the colon)
Nope. <name(expr)>: is equivalent to { .name{expr} }: . It does know how to backtrack into a closure: it skips right by it (or throws an exception through it... not sure which) and tries again. Hypotheticals make this function properly. > It seems to me that for a rule to be able to backtrack, you would need to > pass a closure as arg that represents the rest of the match: the rule > matches, calls the closure, and if the closure returns tries to backtrack > and calls it again, or returns if all possibilities are exhausted. Sounds like continuation-passing style. Yes, you can backtrack through code with continuation-passing style. Continuations have yet to be introduced into the language. > Related to this: what is the prototype for rules (in case you want to > manually write or invoke them) ? rule somerule($0) {} If it takes arguments, put them on the end of the signature. Invoke them just like subs. (Just realized something: you can't do {...} on a rule, because that means match any character three times.) > 3. Negated assertions > > > any assertion that begins with ! is simply negated. > > > \P{prop} <!prop> > > (?!...) <!before ...> # negative lookahead > > [^[:alpha:]] <-alpha> > > Considering <prop> means "matches a character with property prop", it > seems to me <!prop> would mean the ZERO-WIDTH assertion "does not match a > character with property prop", rather than "match a character without > property prop". Right. It has to be. There is no way to implement it in a sufficiently general way otherwise. > 5. Character class semantics > > > predefined character classes are just considered intrinsic grammar rules > > This means you can place arbitrary rules inside a character class. What > if the rule has a width unequal to 1 or even variable-width? I can think > of a few possibilities: > > a. Require subrules inside a character class to have a fixed width of 1 > char. (requires a run-time check since the rule might be redefined.. ick) > > b. Rules inside a character class are ORed together, an inverted subrule > is interpreted as [ <!before <subrule>> . ] > > c. The whole character class is a zero-width assertion followed by the > traversal of a single char. > > My personal preference is (c), which also means \N is equivalent to <-\n> Yikes. Good questions. Recall that Unicode is sortof like multi-character matching, so it might be possible to allow <<anyrule><anyother>>. That might be a way to specify the parallel matching of those two rules. It's entirely likely that I'm wrong. > 6. Null pattern > > > That won't work because it'll look for the :wfoo modifier. However, there > > are several ways to get the effect you want: > > /[:w()foo bar]/ > > /[:w[]foo bar]/ > > Tsk tsk Larry, those look like null patterns to me :-) > > While I'm on the subject.. why not allow <> as the match-always assertion? > It might conflict with huffman encoding, but I certainly don't think <> > could ethically mean anything other than this. And <!> would ofcourse be > the match-never assertion. You could always use <(1)> and <(0)>, which are more SWIMmy :) > 7. The :: operator > > > :: # fail all |'s when backtracking > > > If you backtrack across it, it fails all the way out of the current > > list of alternatives. > > This suggests that if you do: > [ foo [ bar :: ]? | foo ( \w+ ) ] > that if it backtracks over the :: it will break out of the outermost [], > since the innermost isn't a list of alternatives. > > Or does it simply break out of the innermost group, and are the > descriptions chosen a bit poorly? I think that's the one. It would make sense, since a list of alternatives is either surrounded by brackets or the rule boundaries. > That's it for now I think.. maybe I'll find more later :) These were stumpers. Thanks! :) Luke