Hi Christian, Sorry I missed you message. For some reason the title of the thread was broken in the other answers.
> Le 3 janv. 2020 à 13:08, Christian Schoenebeck <schoeneb...@crudebyte.com> a > écrit : > > On Freitag, 3. Januar 2020 11:07:05 CET Akim Demaille wrote: >> One severe issue brought to my attention by Rici Lake (unfortunately >> privately, although he had written a very nice and detailed mail with >> all the details) is that this would break several existing parsers >> that expect yytname to be this way. For instance he pointed to >> >> https://git.gnupg.org/cgi-bin/gitweb.cgi?p=libksba.git;a=blob;f=src/asn1-par >> se.y;h=5bff15cd8db64786f7c9e2ef000aeddd583cfdc0;hb=HEAD#l856 >> currently not responding, but the code is: >> | for (k = 0; k < YYNTOKENS; k++) >> | >> | { >> | >> | if (yytname[k] && yytname[k][0] == '\"' >> | >> | && !strncmp (yytname[k] + 1, string, len) >> | && yytname[k][len + 1] == '\"' && !yytname[k][len + 2]) >> | >> | return yytoknum[k]; >> | >> | } > > Looks like the use case here is to distinguish non-terminals from terminal > symbols. That could be addressed by introducing some official API function: > > bool yy_is_non_terminal(enum yysymbolid id); > > and/or: > > bool yy_is_terminal(enum yysymbolid id); Not exactly. The test here is to tell the difference between string aliases ("break" represented as "\"break\"") and plain symbols (TOK_BREAK, represented as "TOK_BREAK"). The difference bw terminal and non terminals is handled by the loop itself: starting at YYNTOKENS, it's only nonterminals. Anyway, as I mentioned I don't want to support this. And I will not make it easier. > Then those double quotes could simply be dropped. Or was there any other use > case for looking at those double quote characters? I definitely want to get rid of these quotes! But not with 'verbose' error messages, only with 'custom' and 'rich'. >> I think he is right, hence the call to yysyntax_error_arguments which >> returns the list of expected/unexpected tokens. > > Actuallly I had a general purpose push API in mind. Your suggestion would > limit retrieving the "next expected symbols" solely to error message > purposes. yes, I'm focusing on improving the error messages, which is probably the most common request these last years. > Why not making that a general-purpose function instead that users could call > at any time with the current parser state: > > // returns NULL terminated list > const enum yysymbolid* yynextsymbols(const yystate* currentParserState); I don't want to have to deal with allocating space. Your proposal needs to allocate space. Hence the clumsy interface I provided :) > Because there are other important use cases that I pointed out to you: > auto completion features; e.g. interactive command line shells where the user > can auto complete the currently incomplete command by hitting tab key, or a > programming language code editor GUI/IDE where the user would get a non- > obtrusive popup while typing for potential code completions. In these use > cases you are not (necessarily) addressing syntax errors. The parser might be > very well in some valid state. I see your point. > For that purpose, and to continue the idea about a general purpose push API, > it would be very useful to have a function for duplicating the current parser > state: > > yystate* yydupstate(const yystate* parserState); Wow, you're talking about massive surgery in yacc.c. Roughly, stop using local variables for the stacks. Which is what the push-interface does (I'm talking about api.push here). Or are you referring to push-parsers when you say "push API"? > and one function to push parse on a specific parser state: > > bool yypushparse(yystate* parserState, char nextchar); > > The latter returning false on parser errors. That way people would have a > very > flexible and powerful API for all kinds of use cases. Because by being able > to > duplicate states, you can have "throw away" parser states, where you can try > out things without touching the "official" parser state. For instance I am > using > that to auto correct user typos in some parsers (that is guessing what user > had in mind on syntax errors by some limited brute force attempts by parser > on > throw-away parser states). That might be doable with api.push. I don't see that coming for the pull interface. > But there are many other use cases as well for this: for instance multi- > threaded parsing tasks where each thread would get its own parser state and > each thread e.g. might be working on a different branch of a grammar tree to > reduce latency (overall response time) of a parser system. Again, that's the kind of things for api.pure, not the regular yacc.c. >> I can't make up my mind on whether returning the list of expected >> tokens as strings (as exemplified above), or simply as their symbol >> numbers. Symbol numbers are more efficient, yet they are the >> *internal* symbol numbers, not the ones the user is exposed to. > > I would suggest both. It would make sense to auto generate an enum list for > all symbols like: > > enum yysymbolid { > IDENTIFIER, > SWITCH, > IF, > CONST, > ... > }; > and use that numeric type probably for most Bison APIs for performance > reasons. That type could also be condensed to a smaller type if requested > (i.e. for embedded systems): > > enum yysymbolid : uint8_t { > IDENTIFIER, > SWITCH, > IF, > CONST, > ... > }; > > But there should still be a way for people being able to convert that > conveniently to its original string representation from source.y: > > const char* yysymbolname(enum yysymbolid); Yes, of course. That's not "both", that's just what I refer to by "exposing the numbers". "yysymbolname(x)" is currently just "yytname[x]". > Happy new 2k20 BTW! ;-) Thanks! Best wishes to you!