Some of your other motivations for the IRIF will stay strong. For
example, because I want to keep my efforts as language-agnostic as
possible, theSLIF will probably never allow inline Perl code. And I am
not going to change defaults in Marpa::R2, even where the old default
was clearly the inferior choice. because of backward compatibility.
My next step I think will be "cheap forgiveness" -- forgiveness where
the cost is O(n) with a low constant, instead of the potential for
O(n**2). That means that simply marking every token "forgiving" becomes
a fairly low-cost and safe thing to do. "Cheap forgiveness" will require
an assist at the Libmarpa level, which will be available to anyone else
using the THIF.
-- jeffrey
On 01/08/2014 12:38 PM, amon wrote:
Thank you /so much/. This is the behavior anyone would expect from a
/scannerless/ interface. It also happens to remove one of the three
main motivations for my IRIF project :-)
Calling this feature “forgiving” is probably a good idea although it
assumes enough familiarity with writing your own lexer for Marpa to
understand what it means. I think that other names like “variable
size”, “best length”, “informed lexing”[1], or “context aware
lexing”[2] might be more beginner-friendly even if it's
/implemented/ as a forgiveness operation – but the question is who you
are optimizing for. One could also consider that forgiving lexing is
somewhat backwards compatible (any SLIF grammar that parsed
successfully will continue to parse the same way with forgiving
lexing). One might therefore make forgiveness the default and call the
current behaviour “naive”[3] or “traditional”. But eh, names are moot
as soon as this is documented.
[1]: amazingly, this awesome term has not yet been coined.
[2]: see /Context-Aware Scanning For Parsing Extensible
Languages/ by Van Wyk & Schwerdfeger, which seems to describe
longest acceptable token matching (guessing from the abstract).
The disadvantage is that you don't want to have been misunderstood
as saying “context-/sensitive/”.
[3]: see that Stack Overflow question of mine…
Now I have a few questions concerning the exact semantics.
Here is how the SLIF seems to work with naive lexing:
all lexemes → find longest → accept that, or fail
Here is how the SLIF seems to work with context aware lexing:
all lexemes → find longest match that is also accepted, or fail
Is this interpretation correct?
Here is how my mind (and the IRIF and Repa) work:
all lexemes → find those that /can/ be accepted → match longest,
or fail
which is desirable in a regex-based scanner that has to test all
possible tokens sequentially, as it narrows the search space. I
accordingly refer to this as /longest acceptable token matching/,
which hints at the different implementation.
1. In case of multiple distinct longest acceptable tokens at a
certain position:
Are all of them still being recognized? Expected: yes.
2. Given the grammar "A ::= B C | C C; B ~ 'a'+; C ~ 'aa'" and the
input "aaaa":
(Why) does this fail? Expected for all variants: failure because"B
~ 'a'+" matches the whole input, thus starving "C".
Thanks!
--
You received this message because you are subscribed to the Google
Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "marpa
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.