Re: New developer's version, with forgiving tokens

Ruslan Zakirov Wed, 08 Jan 2014 22:14:36 -0800

On Thu, Jan 9, 2014 at 12:38 AM, amon <[email protected]> wrote:

> Thank you *so much*. This is the behavior anyone would expect from a
> *scannerless* interface. It also happens to remove one of the three main
> motivations for my IRIF project :-)
>


Scannerless still picks longest, but new option is way better.


>
> Calling this feature “forgiving” is probably a good idea although it
> assumes enough familiarity with writing your own lexer for Marpa to
> understand what it means. I think that other names like “variable size”,
> “best length”, “informed lexing”[1], or “context aware lexing”[2] might be
> more beginner-friendly even if it's *implemented* as a forgiveness
> operation – but the question is who you are optimizing for. One could also
> consider that forgiving lexing is somewhat backwards compatible (any SLIF
> grammar that parsed successfully will continue to parse the same way with
> forgiving lexing). One might therefore make forgiveness the default and
> call the current behaviour “naive”[3] or “traditional”. But eh, names are
> moot as soon as this is documented.
>
> [1]: amazingly, this awesome term has not yet been coined.
> [2]: see *Context-Aware Scanning For Parsing Extensible Languages* by Van
> Wyk & Schwerdfeger, which seems to describe longest acceptable token
> matching (guessing from the abstract). The disadvantage is that you
> don't want to have been misunderstood as saying “context-*sensitive*”.
> [3]: see that Stack Overflow question of mine…
>
>
>
Since "forgiving" describes a token then above are not well suited for
marking lexer's rules. Other variants are "fallback", "too greedy", "try
next", "next on fail".


> Now I have a few questions concerning the exact semantics.
>

> Here is how the SLIF seems to work with naive lexing:
>
>> all lexemes → find longest → accept that, or fail
>
>
> Here is how the SLIF seems to work with context aware lexing:
>
>> all lexemes → find longest match that is also accepted, or fail
>
> Is this interpretation correct?
>

Only if first LTMed rule was marked as forgiving.

Here is how my mind (and the IRIF and Repa) work:
>
>> all lexemes → find those that *can* be accepted → match longest, or fail
>
> which is desirable in a regex-based scanner that has to test all possible
> tokens sequentially, as it narrows the search space. I accordingly refer to
> this as *longest acceptable token matching*, which hints at the different
> implementation.
>

Repa by default accepts all matched. My local repository has option to
enable above mode when only longest expexted is accepted.


>    1. In case of multiple distinct longest acceptable tokens at a certain
>    position:
>    Are all of them still being recognized? Expected: yes.
>
>    2. Given the grammar "A ::= B C | C C; B ~ 'a'+; C ~ 'aa'" and the
>    input "aaaa":
>    (Why) does this fail? Expected for all variants: failure because "B ~
>    'a'+" matches the whole input, thus starving "C".
>
> B matches "aaaa" and is acceptable, C matches "aa"  - too short, so is B
is accepted with length 4. No room for ending C.

Forgiving doesn't mean tracking back, so B never gives away symbols.

Repa will parse this as "C C" in default mode.



> Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>



-- 
Best regards, Ruslan.

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: New developer's version, with forgiving tokens

Reply via email to