I have limited Internet access so must be brief. Suggest look at Marpa::R3. In particular input and event models much changed and improved.
On Mar 28, 2018 10:40 PM, "Andreas Kupries" <[email protected]> wrote: > > Having completed a first larger example using Marpa/Tcl [4] the next > example I started to work on has macros and include files and thus > forces me to look into the missing support for non-continuous input, > non-sequential moves in the input, additional input, and the like, and > of course, parse events. > > Before starting on that I had a look at how Marpa::R2 (MR2 later on) > is doing things, for ideas and inspiration. > > This mail here is my attempt to summarize what I learned, and verify > that my mental model is good enough. > > # Input first > > - Spans and ranges. Ok, not very complex, nothing to say. > > - MR2 expects to have the full text to be parsed available. > > => Parsing files requires them to be either read or mappied into > memory somehow. > > * No parsing of a stream (socket, pipe, and the like). > > * Generally speaking, no incremental parsing. > > - MR2 treats the input text (physical input stream = pIS) as > immutable, in terms of its span. > > The proposed way of handling additional input [1] (like the content > of include files and other externals) is to essentially allocate a > much larger pIS than needed for the actual "natural input", so that > we have space after it in the pIS where all the dynamically things > can then go. > > This naturally requires some sort of a-priori estimation of the max > amount of new text which can happen before parsing start. (Something > a bit fraught with peril I suspect). > > - MR2 further has a virtual input stream (vIS), essentially a span of > the pIS. It may start as the full pIS, or be a sub-string. When > handling parse-events `resume` can change this to an arbitrary span > too. > > - The descriptions of `read` [2] and `resume` mention > > ``` > [...] is considered successful if it reaches the end of input string, ... > ``` > > Is `end of input` here always `end of pIS`, or does it instead mean > `end of vIS` as set by `read` or last call to `resume` ? > > # And parse events > > - When speaking of lexeme vs non-lexeme events I suspect that the > latter are only about/for the G1 non-terminal symbols. > > (In my mind the non-lexeme L0 symbols are also non-lexeme, strictly > speaking) > > - While there is a lot of talk about event location, and trigger > location, etc. practically speaking the user sees only > > * current location, always, through pos(). > * lexeme span for lexeme events, through `pause_span`. > * lexeme span for discard events, through the event descriptor. > > And pre-lexeme events are the only case where the current location > is at the start of the lexeme, everything else has it set to the end > of the lexeme span. > > - Looking at the set of methods for use when handling a lexeme event > (LE), i.e > > - lexeme_alternative > - lexeme_complete > - resume > > I sort of get the model that when __no__ lexeme parse event triggers > the system (L0 engine) automatically runs the internal equivalent of > > ``` > lexeme_alternative # for all accepted symbols > lexeme_complete > resume # after the current lexeme > ``` > > to pass lexemes to the G1 engine, whereas with a LE in play the > responsibility for calling any of these simply passes to the user > instead, bypassing the above completely. > > Predictions: > > - Behavior: When multiple lexemes match at a span, and one of them > triggers a LE, the other lexeme will be lost. > > For there is no accessor to get them all out of the recognizer > (`pause_lexeme` is specified to return something arbitrary from > the set, not the trigger lexeme, not all of them). > > - Implementation: Triggering of LE is handled in the L0 engine and > its wrapper, likely by hooking into the completion-events for > collection, and filtering when we know that acceptable symbols > exist and their span. > > - Implementation: The above mental model (and predicted > implementation) makes the statement "Lexeme SLIF parse events are > ignored during `lexeme_read`" a trivial thing. > > We are not really "reading" a lexeme with `lexeme_read`. > > We are pushing it to the G1 engine and are very much past the > point where the L0 engine wrapper collected and decided on LE > handling. > > The other events can still happen because they are handled by the > G1 engine we are pusing to. Only exception are Discard events, > they happen completely in the L0 wrapper code and are decided on > before/to the side of LE. > > It is actually this, of non-lexeme events likely handled by the G1 > engine vs lexeme events by te L0 engine, which convinced me that > the `lexeme vs non-lexeme events` meant G1 non-terminals for the > latter. > > Open question: > > - When handling an LE, is it possible to not only specify > alternatives of a single lexeme, but also specify a __series__ of > lexemes to use before resuming (internal scanning) ? > > I currently suspect not. Would be do-able with a lexeme side queue > which is used over the L0 engine when a lexeme is needed, until > empty, and then switching back. > > > ~~~ > [1] http://search.cpan.org/~jkegl/Marpa-R2-4.000000/pod/ > Scanless/R.pod#External > _lexemes_and_the_input_stream > [2] http://search.cpan.org/~jkegl/Marpa-R2-4.000000/pod/ > Scanless/R.pod#read() > [3] http://search.cpan.org/~jkegl/Marpa-R2-4.000000/pod/ > Scanless/R.pod#resume() > > [4] json. Most work for it was not the grammar, but getting the > underlying unicode processing correct. > > -- > See you, > Andreas Kupries <[email protected]> > <http://core.tcl.tk/akupries/> > Developer @ SUSE (MicroFocus Canada LLC) > <[email protected]> > > EuroTcl 2018, Jul 7-8, Munich/DE, http://eurotcl.eu/ > Tcl'2018, Oct 15-19, Houston, TX, USA. https://www.tcl.tk/community/ > tcl2018/ > ------------------------------------------------------------ > ------------------- > > > > > -- > You received this message because you are subscribed to the Google Groups > "marpa parser" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
