> 1. Sub-rules and backtracking
> 
> >    <name(expr)>          # call rule, passing Perl args
> >    { .name(expr) }       # same thing.
> 
> >    <name pat>            # call rule, passing regex arg
> >    { .name(/pat/) }      # same thing.
> 
> Considering perl can't sanely know how to backtrack into a closure, wouldn't  
> { .name(expr) }  be equal to  <name(expr)>:  instead?  (note the colon)

Nope.  <name(expr)>: is equivalent to { .name{expr} }: .  It does know
how to backtrack into a closure:  it skips right by it (or throws an
exception through it... not sure which) and tries again.
Hypotheticals make this function properly.

> It seems to me that for a rule to be able to backtrack, you would need to 
> pass a closure as arg that represents the rest of the match:  the rule 
> matches, calls the closure, and if the closure returns tries to backtrack 
> and calls it again, or returns if all possibilities are exhausted.

Sounds like continuation-passing style.  Yes, you can backtrack
through code with continuation-passing style.  Continuations have yet
to be introduced into the language.


> Related to this: what is the prototype for rules (in case you want to 
> manually write or invoke them) ?

    rule somerule($0) {}

If it takes arguments, put them on the end of the signature.  Invoke
them just like subs.

(Just realized something: you can't do {...} on a rule, because that
means match any character three times.)

> 3. Negated assertions
> 
> > any assertion that begins with ! is simply negated.
> 
> >     \P{prop}            <!prop>
> >     (?!...)             <!before ...>   # negative lookahead
> >     [^[:alpha:]]        <-alpha>
> 
> Considering <prop> means "matches a character with property prop", it 
> seems to me <!prop> would mean the ZERO-WIDTH assertion "does not match a 
> character with property prop", rather than "match a character without 
> property prop".

Right.  It has to be.  There is no way to implement it in a
sufficiently general way otherwise.

> 5. Character class semantics
> 
> > predefined character classes are just considered intrinsic grammar rules
> 
> This means you can place arbitrary rules inside a character class.  What 
> if the rule has a width unequal to 1 or even variable-width?  I can think 
> of a few possibilities:
> 
> a. Require subrules inside a character class to have a fixed width of 1 
> char. (requires a run-time check since the rule might be redefined.. ick)
> 
> b. Rules inside a character class are ORed together, an inverted subrule 
> is interpreted as [ <!before <subrule>> . ]
> 
> c. The whole character class is a zero-width assertion followed by the 
> traversal of a single char.
> 
> My personal preference is (c), which also means \N is equivalent to <-\n>

Yikes.  Good questions.  Recall that Unicode is sortof like
multi-character matching, so it might be possible to allow
<<anyrule><anyother>>.  That might be a way to specify the parallel
matching of those two rules.  It's entirely likely that I'm wrong.

> 6. Null pattern
> 
> > That won't work because it'll look for the :wfoo modifier. However, there
> > are several ways to get the effect you want:
> >     /[:w()foo bar]/ 
> >     /[:w[]foo bar]/
> 
> Tsk tsk Larry, those look like null patterns to me :-)
> 
> While I'm on the subject.. why not allow <> as the match-always assertion?  
> It might conflict with huffman encoding, but I certainly don't think <> 
> could ethically mean anything other than this.  And <!> would ofcourse be 
> the match-never assertion.

You could always use <(1)> and <(0)>, which are more SWIMmy :)

> 7. The :: operator
> 
> >     ::                    # fail all |'s when backtracking
> 
> > If you backtrack across it, it fails all the way out of the current
> > list of alternatives.
> 
> This suggests that if you do:
>      [ foo [ bar :: ]? | foo ( \w+ ) ]
> that if it backtracks over the :: it will break out of the outermost [], 
> since the innermost isn't a list of alternatives.
> 
> Or does it simply break out of the innermost group, and are the 
> descriptions chosen a bit poorly?

I think that's the one.  It would make sense, since a list of
alternatives is either surrounded by brackets or the rule boundaries.

> That's it for now I think.. maybe I'll find more later :)

These were stumpers.  Thanks!  :)

Luke

Reply via email to