Re: RFC: custom error messages

Christian Schoenebeck Tue, 14 Jan 2020 05:51:33 -0800

On Freitag, 10. Januar 2020 07:23:43 CET Akim Demaille wrote:
> Hi Christian!


Hi Akim,

> > Le 9 janv. 2020 à 14:50, Christian Schoenebeck <[email protected]>
> > a écrit :> 
> > On Sonntag, 5. Januar 2020 17:52:43 CET Akim Demaille wrote:
> >>> Why not making that a general-purpose function instead that users could
> >>> call at any time with the current parser state:
> >>> 
> >>> // returns NULL terminated list
> >>> const enum yysymbolid* yynextsymbols(const yystate* currentParserState);
> >> 
> >> I don't want to have to deal with allocating space.  Your proposal
> >> needs to allocate space.  Hence the clumsy interface I provided :)
> > 
> > Well, allocation is just a minor API detail that could easily be
> > addressed.
> 
> I wouldn't call memory management in yacc.c "easy": lots of efforts
> are made to allocate on the call stack, and to avoid malloc.

Sorry, I just meant that in the scope of this new suggested API function, it 
was not about memory allocation in Bison in general, which is of course more 
delecate.

For such a new API function it would be easier, e.g. because you don't have to 
take care about breaking existing users' code. So you could e.g. add an 
optional allocator argument (where NULL would use Bison's default allocation) 
for this new API function, or a macro that could be redefined by users, etc. 
So there would be a bunch of options to address this concern.

> >> Yes, of course.  That's not "both", that's just what I refer
> >> to by "exposing the numbers".  "yysymbolname(x)" is currently
> >> just "yytname[x]".
> > 
> > Sure, it is just not clear to me what your actual future plans about
> > yytname[x] are; I see that you are constantly struggling with numerous
> > issues because of people using what were supposed to be skeleton
> > internal-only data structures due to lack of official public APIs. So
> > that was my reason to suggest considering to add official APIs for common
> > specific use cases. E.g.:
> > 
> > /**
> > * Name of the symbol (i.e. token name or LHS grammar rule name)
> > * from input.y
> > */
> > const char* yysymbolname(enum yysymbolid);
> 
> I do not plan to expose an enum for symbol numbers.  What value would
> it bring to give name to these numbers?

To make it clear which internal numbers are actually reflecting user's rules/
tokens. Once in a while I am also fooled by looking up the wrong symbol names 
when reading the generated parser code manually.

But really, no problem. I totally understand that you don't want to emit enums 
for them.

My primary point here was actually not the enum, but that it might make sense 
to introduce official APIs for common use cases somewhere in future. That 
would provide more freedom to change internal Bison code more easily without 
having to fear breaking existing user code. But again: long-term ideas.

> > On Montag, 6. Januar 2020 19:23:27 CET Rici Lake wrote:
> >> So I think that there is still time to consider the wider question of how
> >> a
> >> bison grammar might be able to show both literal keywords and
> >> human-readable token descriptions in a way that is useful for both
> >> applications. As a side-benefit, this might also make grammars easier to
> >> read by humans, because the current mechanism does not make it clear to a
> >> human reader in all cases whether a quoted string is intended to be
> >> inserted literally into the parsed text, or whether it is necessary to
> >> hunt
> >> through the grammar's token declarations to find the named token for
> >> which
> >> the quoted string is an alias. (I've been fooled by this several times
> >> while trying to read grammars, particularly when only an extract of the
> >> grammar is presented.)
> > 
> > ++vote;
> 
> I already answered to that.  That's the plan for Bison 4, whose preparation
> has already started.  But that's not for right now, and it's unlikely to
> happen in 2020.

No problem, as far as I can see it, the things that you have planned for
Bison 4 are IMO going to be the most powerful and sought-after changes from 
user perspective in years, especially this one:

> I fully subscribe to this view, but string literals are definitely not
> the way to go.  So a few months ago I realized that what we really need
> to do is to merge Joel E. Denny's PhD into Bison
> (https://tigerprints.clemson.edu/all_dissertations/519/).
> 
> _That's_ the real way forward.  That's Bison 4.

So IMO all other issues discussed here so far would be in the shadow of this 
major new feature anyway. And to prevent any missapprehension: what I mean is 
as user you can customize/circumvent/address certain things e.g. by injecting 
code, redefine macros etc., but that new feature would be clearly beyond of 
all of that.

Do you plan to "merge" the high level (user visible) aspects of this built-in 
scanner support feature "as-is", or have you already ideas about adjusting 
certain high-level aspects (if that's not too early to discuss at all)?

> > I already read several people saying that it was not possible to address
> > both use cases. What am I missing here?
> > 
> > - It would require to auto generate a 2nd table.
> > 
> > - On grammer input side it would make sense to handle this issue by more
> > 
> >  specific declarations which reflect their intended semantics more
> >  appropriate, e.g.
> >  
> >     %token LE raw="<=" human-err="operator '<='"
> > 
> > if localization is desired (i.e. translations):
> >     %token LE raw="<=" human-err=_("operator '<='")
> 
> Sorry, *I will not support this*.  I made my mind.  These strings are
> there for error message only.  This feature does not scale for the
> real need, so I will not improve this ill-designed feature that does
> not cover all the needs to generate a scanner.
> 
> Bison 3.6 will improve the generation of error messages for those who
> want to switch the new system.
> 
> Bison 4 will address the *much* wider issue of the scanner interface.

Sure, it's certainly not everybody's need, and people can address this by 
adding their own tables and lookups for now.

I still think though it might be a candidate on the long-term, simply for 
readability reasons of grammar sources.

> > On Sonntag, 5. Januar 2020 17:30:18 CET Akim Demaille wrote:
> >>> One could tackle this particular use case also from a different angle:
> >>> We could introduce the concept of "opaque" rules, i.e. rules which are
> >>> not
> >>> expanded when reporting syntax errors.
> >>> 
> >>> E.g., if I could define "unreserved_keyword" as
> >>> 
> >>>> unreserved_keyword [opaqe]: ABORT_P | ABSOLUTE_P | <...>
> >>> 
> >>> bison should then create the error message
> >>> 
> >>>> expected: Identifier, unreserved_keyword
> >>> 
> >>> instead of
> >>> 
> >>>> expected: Identifier, <long list containing all unreserved keywords>
> >> 
> >> Too complex, and fitting just one special case.  With EDSLs, the
> >> sheer concept of "keyword" is more complex than this.
> > 
> > Actually I was thinking about the exact same feature before that Adrian
> > suggested as "opaque" attribute here, for a different use case though:
> > parsers without external scanner:
> > 
> > CREATE  :  'c''r''e''a''t''e'
> > 
> >        ;
> > 
> > In that case you would like e.g. a syntax error message like
> > 
> >     "Expecting 'create', got 'foo'"
> > 
> > instead of
> > 
> >     "Expecting 'c', got 'f'"
> > 
> > I inject code to handle that ATM. I could imagine this to be controlled by
> > doxygen style comments, something like:
> > 
> > 
> > /**
> > * @symbol-visibility opaque
> > */
> > CREATE  :  'c''r''e''a''t''e'
> > 
> >        ;
> > 
> > That would prevent backward compatiblity issues and would handle this
> > "detail" feature in a graceful, non-invasive way.
> > 
> > I could imagine that as an alternative for my %token change suggestions
> > above BTW, that is e.g.:
> > 
> > /**
> > * @raw "<="
> > * @human-err "operator '<='"
> > */
> > %token LE
> 
> You clearly stepped out of the traditional dichotomy between the scanner
> and the parser (for reasons I perfectly understand and respect).  However,
> what you did is just a hack which will be obsoleted in Bison 4, so I will
> not work on features that will be useless.

Sure, let's give that "opaque" issue a rest for now. It all depends on how 
that major new builtin scanner feature evolves exactly. But I am not 100% sure 
yet that this will be resolved by the builtin scanner. We'll see. :-)

Last question: I noticed you mentioned it was already hard enough to test 
Bison code right now. Would it make sense to establish some kind of well 
defined, distributed test case mechanism for upstream projects? I mean in the 
sense that upstream projects using Bison would write test cases for their own 
specific use cases of Bison by using a some kind of defined interface for you 
to automatically grab, compile and execute them?

Best regards,
Christian Schoenebeck

Re: RFC: custom error messages

Reply via email to