On Thu, Jan 9, 2014 at 12:38 AM, amon <[email protected]> wrote: > Thank you *so much*. This is the behavior anyone would expect from a > *scannerless* interface. It also happens to remove one of the three main > motivations for my IRIF project :-) >
Scannerless still picks longest, but new option is way better. > > Calling this feature “forgiving” is probably a good idea although it > assumes enough familiarity with writing your own lexer for Marpa to > understand what it means. I think that other names like “variable size”, > “best length”, “informed lexing”[1], or “context aware lexing”[2] might be > more beginner-friendly even if it's *implemented* as a forgiveness > operation – but the question is who you are optimizing for. One could also > consider that forgiving lexing is somewhat backwards compatible (any SLIF > grammar that parsed successfully will continue to parse the same way with > forgiving lexing). One might therefore make forgiveness the default and > call the current behaviour “naive”[3] or “traditional”. But eh, names are > moot as soon as this is documented. > > [1]: amazingly, this awesome term has not yet been coined. > [2]: see *Context-Aware Scanning For Parsing Extensible Languages* by Van > Wyk & Schwerdfeger, which seems to describe longest acceptable token > matching (guessing from the abstract). The disadvantage is that you > don't want to have been misunderstood as saying “context-*sensitive*”. > [3]: see that Stack Overflow question of mine… > > > Since "forgiving" describes a token then above are not well suited for marking lexer's rules. Other variants are "fallback", "too greedy", "try next", "next on fail". > Now I have a few questions concerning the exact semantics. > > Here is how the SLIF seems to work with naive lexing: > >> all lexemes → find longest → accept that, or fail > > > Here is how the SLIF seems to work with context aware lexing: > >> all lexemes → find longest match that is also accepted, or fail > > Is this interpretation correct? > Only if first LTMed rule was marked as forgiving. Here is how my mind (and the IRIF and Repa) work: > >> all lexemes → find those that *can* be accepted → match longest, or fail > > which is desirable in a regex-based scanner that has to test all possible > tokens sequentially, as it narrows the search space. I accordingly refer to > this as *longest acceptable token matching*, which hints at the different > implementation. > Repa by default accepts all matched. My local repository has option to enable above mode when only longest expexted is accepted. > 1. In case of multiple distinct longest acceptable tokens at a certain > position: > Are all of them still being recognized? Expected: yes. > > 2. Given the grammar "A ::= B C | C C; B ~ 'a'+; C ~ 'aa'" and the > input "aaaa": > (Why) does this fail? Expected for all variants: failure because "B ~ > 'a'+" matches the whole input, thus starving "C". > > B matches "aaaa" and is acceptable, C matches "aa" - too short, so is B is accepted with length 4. No room for ending C. Forgiving doesn't mean tracking back, so B never gives away symbols. Repa will parse this as "C C" in default mode. > Thanks! > > -- > You received this message because you are subscribed to the Google Groups > "marpa parser" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- Best regards, Ruslan. -- You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
