Re: Input System and Parse Events

Jeffrey Kegler Thu, 29 Mar 2018 06:38:56 -0700

I have limited Internet access so must be brief.  Suggest look at
Marpa::R3.  In particular input and event models much changed and improved.


On Mar 28, 2018 10:40 PM, "Andreas Kupries" <[email protected]> wrote:

>
> Having completed a first larger example using Marpa/Tcl [4] the next
> example I started to work on has macros and include files and thus
> forces me to look into the missing support for non-continuous input,
> non-sequential moves in the input, additional input, and the like, and
> of course, parse events.
>
> Before starting on that I had a look at how Marpa::R2 (MR2 later on)
> is doing things, for ideas and inspiration.
>
> This mail here is my attempt to summarize what I learned, and verify
> that my mental model is good enough.
>
> # Input first
>
> - Spans and ranges. Ok, not very complex, nothing to say.
>
> - MR2 expects to have the full text to be parsed available.
>
>   => Parsing files requires them to be either read or mappied into
>      memory somehow.
>
>   * No parsing of a stream (socket, pipe, and the like).
>
>   * Generally speaking, no incremental parsing.
>
> - MR2 treats the input text (physical input stream = pIS) as
>   immutable, in terms of its span.
>
>   The proposed way of handling additional input [1] (like the content
>   of include files and other externals) is to essentially allocate a
>   much larger pIS than needed for the actual "natural input", so that
>   we have space after it in the pIS where all the dynamically things
>   can then go.
>
>   This naturally requires some sort of a-priori estimation of the max
>   amount of new text which can happen before parsing start. (Something
>   a bit fraught with peril I suspect).
>
> - MR2 further has a virtual input stream (vIS), essentially a span of
>   the pIS. It may start as the full pIS, or be a sub-string. When
>   handling parse-events `resume` can change this to an arbitrary span
>   too.
>
> - The descriptions of `read` [2] and `resume` mention
>
>   ```
>   [...] is considered successful if it reaches the end of input string, ...
>   ```
>
>   Is `end of input` here always `end of pIS`, or does it instead mean
>   `end of vIS` as set by `read` or last call to `resume` ?
>
> # And parse events
>
> - When speaking of lexeme vs non-lexeme events I suspect that the
>   latter are only about/for the G1 non-terminal symbols.
>
>   (In my mind the non-lexeme L0 symbols are also non-lexeme, strictly
>   speaking)
>
> - While there is a lot of talk about event location, and trigger
>   location, etc. practically speaking the user sees only
>
>   * current location, always, through pos().
>   * lexeme span for lexeme events, through `pause_span`.
>   * lexeme span for discard events, through the event descriptor.
>
>   And pre-lexeme events are the only case where the current location
>   is at the start of the lexeme, everything else has it set to the end
>   of the lexeme span.
>
> - Looking at the set of methods for use when handling a lexeme event
>   (LE), i.e
>
>   - lexeme_alternative
>   - lexeme_complete
>   - resume
>
>   I sort of get the model that when __no__ lexeme parse event triggers
>   the system (L0 engine) automatically runs the internal equivalent of
>
>   ```
>         lexeme_alternative # for all accepted symbols
>         lexeme_complete
>         resume             # after the current lexeme
>   ```
>
>   to pass lexemes to the G1 engine, whereas with a LE in play the
>   responsibility for calling any of these simply passes to the user
>   instead, bypassing the above completely.
>
>   Predictions:
>
>   - Behavior: When multiple lexemes match at a span, and one of them
>     triggers a LE, the other lexeme will be lost.
>
>     For there is no accessor to get them all out of the recognizer
>     (`pause_lexeme` is specified to return something arbitrary from
>     the set, not the trigger lexeme, not all of them).
>
>   - Implementation: Triggering of LE is handled in the L0 engine and
>     its wrapper, likely by hooking into the completion-events for
>     collection, and filtering when we know that acceptable symbols
>     exist and their span.
>
>   - Implementation: The above mental model (and predicted
>     implementation) makes the statement "Lexeme SLIF parse events are
>     ignored during `lexeme_read`" a trivial thing.
>
>     We are not really "reading" a lexeme with `lexeme_read`.
>
>     We are pushing it to the G1 engine and are very much past the
>     point where the L0 engine wrapper collected and decided on LE
>     handling.
>
>     The other events can still happen because they are handled by the
>     G1 engine we are pusing to. Only exception are Discard events,
>     they happen completely in the L0 wrapper code and are decided on
>     before/to the side of LE.
>
>     It is actually this, of non-lexeme events likely handled by the G1
>     engine vs lexeme events by te L0 engine, which convinced me that
>     the `lexeme vs non-lexeme events` meant G1 non-terminals for the
>     latter.
>
>   Open question:
>
>   - When handling an LE, is it possible to not only specify
>     alternatives of a single lexeme, but also specify a __series__ of
>     lexemes to use before resuming (internal scanning) ?
>
>     I currently suspect not. Would be do-able with a lexeme side queue
>     which is used over the L0 engine when a lexeme is needed, until
>     empty, and then switching back.
>
>
> ~~~
> [1] http://search.cpan.org/~jkegl/Marpa-R2-4.000000/pod/
> Scanless/R.pod#External
> _lexemes_and_the_input_stream
> [2] http://search.cpan.org/~jkegl/Marpa-R2-4.000000/pod/
> Scanless/R.pod#read()
> [3] http://search.cpan.org/~jkegl/Marpa-R2-4.000000/pod/
> Scanless/R.pod#resume()
>
> [4] json. Most work for it was not the grammar, but getting the
>     underlying unicode processing correct.
>
> --
> See you,
>         Andreas Kupries <[email protected]>
>                         <http://core.tcl.tk/akupries/>
>         Developer @     SUSE (MicroFocus Canada LLC)
>                         <[email protected]>
>
> EuroTcl 2018, Jul 7-8, Munich/DE, http://eurotcl.eu/
> Tcl'2018, Oct 15-19, Houston, TX, USA. https://www.tcl.tk/community/
> tcl2018/
> ------------------------------------------------------------
> -------------------
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Input System and Parse Events

Reply via email to