Re: New developer's version, with forgiving tokens

Jeffrey Kegler Wed, 08 Jan 2014 14:40:10 -0800

Re the first question. The behavior when all tokens are forgiving iswhat you call "longest acceptable tokens". And that is TOKENS in theplural. If several acceptable tokens are of equal length, all of themare returned.


Question 2 is trickier, so I'll answer separately.


-- jeffrey

On 01/08/2014 12:38 PM, amon wrote:

Thank you /so much/. This is the behavior anyone would expect from a/scannerless/ interface. It also happens to remove one of the threemain motivations for my IRIF project :-)
Calling this feature “forgiving” is probably a good idea although itassumes enough familiarity with writing your own lexer for Marpa tounderstand what it means. I think that other names like “variablesize”, “best length”, “informed lexing”[1], or “context awarelexing”[2] might be more beginner-friendly even if it's/implemented/ as a forgiveness operation – but the question is who youare optimizing for. One could also consider that forgiving lexing issomewhat backwards compatible (any SLIF grammar that parsedsuccessfully will continue to parse the same way with forgivinglexing). One might therefore make forgiveness the default and call thecurrent behaviour “naive”[3] or “traditional”. But eh, names are mootas soon as this is documented.
    [1]: amazingly, this awesome term has not yet been coined.
    [2]: see /Context-Aware Scanning For Parsing Extensible
    Languages/ by Van Wyk & Schwerdfeger, which seems to describe
    longest acceptable token matching (guessing from the abstract).
    The disadvantage is that you don't want to have been misunderstood
    as saying “context-/sensitive/”.
    [3]: see that Stack Overflow question of mine…


Now I have a few questions concerning the exact semantics.

Here is how the SLIF seems to work with naive lexing:
all lexemes → find longest → accept that, or fail
Here is how the SLIF seems to work with context aware lexing:

    all lexemes → find longest match that is also accepted, or fail

Is this interpretation correct?

Here is how my mind (and the IRIF and Repa) work:

    all lexemes → find those that /can/ be accepted → match longest,
    or fail
which is desirable in a regex-based scanner that has to test allpossible tokens sequentially, as it narrows the search space. Iaccordingly refer to this as /longest acceptable token matching/,which hints at the different implementation.
 1. In case of multiple distinct longest acceptable tokens at a
    certain position:
    Are all of them still being recognized? Expected: yes.

 2. Given the grammar "A ::= B C | C C; B ~ 'a'+; C ~ 'aa'" and the
    input "aaaa":
    (Why) does this fail? Expected for all variants: failure because"B
    ~ 'a'+" matches the whole input, thus starving "C".

Thanks!
--
You received this message because you are subscribed to the GoogleGroups "marpa parser" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


--
You received this message because you are subscribed to the Google Groups "marpa 
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: New developer's version, with forgiving tokens

Reply via email to