Jean Abou Samra <j...@abou-samra.fr> writes: > Le dimanche 19 mars 2023 à 17:51 +0100, David Kastrup a écrit : >> >> So how to better involve others? The parser may be one of those >> areas with an awful amount of shoestring and glue, namely fiddling >> around until things happen to work. All that fiddling happens in >> private before commits end up in master, meaning that it has no >> opportunity to end up contagious the way it happens now. >> >> That's not really fabulous regarding the "bus factor" in that area. > > I would feel a lot more comfortable with modifying the parser if there > was an explanation, in code comments or in the CG, of how the > parser/lexer interplay works, when lookahead is OK or bad, and how to > avoid it when necessary. Things like the comment above MYBACKUP > > ``` > // The following are somewhat precarious constructs as they may change > // the value of the lookahead token. That implies that the lookahead > // token must not yet have made an impact on the state stack other > // than causing the reduction of the current rule, or switching the > // lookahead token while Bison is mulling it over will cause trouble. > ``` > > are obscure to me.
Well, Bison creates LALR(1) parsers. That means that the parser always is in a certain state. It looks at the next token, the "lookahead" token (only one, that's what the 1 in LALR(1) is about) and then transitions into another state while either shifting the current state onto some stack, or by using a rule for reducing the current stack into a production. The above comment is fearsome about the possibility that the statemachine processes the current lookahead token without eating it, but then getting the lookahead token switched out under its radar and ending in a state that is not able to process the switched-out token. So far, the fears expressed in that comment have not materialized. The parser is only able to process a certain subset of languages. Since the parser makes deterministic progress by either consuming a lookahead token while growing the stack by 1 or by consuming stack material, it ends up O(1), namely efficient with regard to the size of its input. When the parser applies a rule, you can specify code that will be executed in the reduction. The MYBACKUP and MYPARSE stuff messes with the input in order to trigger syntactic decisions based on expression values. That's a bit more than usually expected from a Bison-generated parser. -- David Kastrup