> Le 10 nov. 2018 à 10:38, Hans Åberg <haber...@telia.com> a écrit : > >> Also, see if using %param does not already >> give you what you need to pass information from the scanner to the >> parser’s yyerror. > > How would that get into the yyerror function?
In C, arguments of %parse-param are passed to yyerror. That’s why I mentioned %param, not %lex-param. And in the C++ case, these are members. >>>> I believe that the right approach is rather the one we have in compilers >>>> and in bison: caret errors. >>>> >>>> $ cat /tmp/foo.y >>>> %token FOO 0xff 0xff >>>> %% >>>> exp:; >>>> $ LC_ALL=C bison /tmp/foo.y >>>> /tmp/foo.y:1.17-20: error: syntax error, unexpected integer >>>> %token FOO 0xff 0xff >>>> ^^^^ >>>> I would have been bothered by « unexpected 255 ». >>> >>> Currently, that’s for those still using only ASCII. >> >> No, it’s not, it works with UTF-8. Bison’s count of characters is mostly >> correct. I’m talking about Bison’s own location, used to parse grammars, >> which is improved compared to what we ship in generated parsers. > > Ah. I thought of errors for the generated parser only. Then I only report > byte count, but using character count will probably not help much for caret > errors, as they vary in width. Then problem is that caret errors use two > lines which are hard to synchronize in Unicode. So perhaps some kind of one > line markup instead might do the trick. Two things: One is that the semantics of Bison’s location’s column is not specified: it is up the user to track characters or bytes. As a matter of fact, Bison is hardly concerned by this choice; rather it’s the scanner that has to deal with that. The other one is: once you have the location, you can decide how to display it. In the case of Bison, I think the caret errors are fine, but you could decide to do something different, say use colors or delimiters, to be robust to varying width. >>> I am using Unicode characters and LC_CTYPE=UTF-8, so it will not display >>> properly. In fact, I am using special code to even write out Unicode >>> characters in the error strings, since Bison assumes all strings are ASCII, >>> the bytes with the high bit set being translated into escape sequences. >> >> Yes, I’m aware of this issue, and we have to address it. > > For what I could see, the function that converts it to escapes is sometimes > applied once and sometimes twice, relying on that it is an idempotent. It’s a bit more tricky than this. I’m looking into it, and I’d like to address this in 3.3. >> We also have to provide support for internationalization of >> the token names. > > Personally, I don't have any need for that. I use strings, like > %token logical_not_key "¬" > %token logical_and_key "∧" > %token logical_or_key "∨" > and in the case there are names, they typically match what the lexer > identifies. Yes, not all the strings should be translated. I was thinking of something like %token NUM _("number") %token ID _("identifier") %token PLUS "+" This way, we can even point xgettext to looking at the grammar file rather than the generated parser. _______________________________________________ help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison