Re: Anybody else with an interest in parser wrangling?

Jean Abou Samra Mon, 20 Mar 2023 07:23:18 -0700

Le lundi 20 mars 2023 à 00:15 +0100, David Kastrup a écrit :
> Jean Abou Samra <[j...@abou-samra.fr](mailto:j...@abou-samra.fr)> writes:
> 
> 
> > Le dimanche 19 mars 2023 à 17:51 +0100, David Kastrup a écrit :  
> > 
> > > 
> > > So how to better involve others?  The parser may be one of those
> > > areas with an awful amount of shoestring and glue, namely fiddling
> > > around until things happen to work.  All that fiddling happens in
> > > private before commits end up in master, meaning that it has no
> > > opportunity to end up contagious the way it happens now.
> > >  
> > > That's not really fabulous regarding the "bus factor" in that area.
> > 
> > 
> > I would feel a lot more comfortable with modifying the parser if there
> > was an explanation, in code comments or in the CG, of how the
> > parser/lexer interplay works, when lookahead is OK or bad, and how to
> > avoid it when necessary. Things like the comment above MYBACKUP
> > 
> > ```
> > // The following are somewhat precarious constructs as they may change
> > // the value of the lookahead token.  That implies that the lookahead
> > // token must not yet have made an impact on the state stack other
> > // than causing the reduction of the current rule, or switching the
> > // lookahead token while Bison is mulling it over will cause trouble.
> > ```
> > 
> > are obscure to me.
> 
> 
> Well, Bison creates LALR(1) parsers.  That means that the parser always  
> is in a certain state.  It looks at the next token, the "lookahead"  
> token (only one, that's what the 1 in LALR(1) is about) and then  
> transitions into another state while either shifting the current state  
> onto some stack, or by using a rule for reducing the current stack into  
> a production.
> 
> The above comment is fearsome about the possibility that the  
> statemachine processes the current lookahead token without eating it,  
> but then getting the lookahead token switched out under its radar and  
> ending in a state that is not able to process the switched-out token.
> 
> So far, the fears expressed in that comment have not materialized.
> 
> The parser is only able to process a certain subset of languages.  Since  
> the parser makes deterministic progress by either consuming a lookahead  
> token while growing the stack by 1 or by consuming stack material, it  
> ends up O(1), namely efficient with regard to the size of its input.
> 
> When the parser applies a rule, you can specify code that will be  
> executed in the reduction.
> 
> The MYBACKUP and MYPARSE stuff messes with the input in order to trigger  
> syntactic decisions based on expression values.  That's a bit more than  
> usually expected from a Bison-generated parser.



Yes, I understand the basic way Bison parsers work. What I don't understand is 
what other “effects” the lookahead can have, and why having caused the 
reduction of the current rule is never a problem. AFAIU, the parser works as a 
loop

- Get next token from lexer.

- Decide whether to shift or to reduce some rule. Use a lookahead token if 
necessary.

- Do the shift or the reduction and execute the semantic action.

The lookahead token gets switched during the semantic action. Isn't it a 
problem if the previous lookahead token says the current rule should be 
reduced, but the new one would have required shifting? Or is that just not a 
useful use of MYBACKUP/MYREPARSE?

signature.asc
Description: This is a digitally signed message part

Re: Anybody else with an interest in parser wrangling?

Reply via email to