[il-antlr-interest: 33310] [antlr-interest] a project observation [Re: 'Dude' error in v3.4 and possible bugs explained [was: on "crap" grammars]]

Vlad Fri, 22 Jul 2011 08:47:47 -0700

re: "the tool no longer generates those sets..." It seems like these sets still 
appear in the generated code and are pushed onto a stack for each rule... 
Anyway, thanks for confirming the fact, if not exactly explaining why. I think 
it's a bummer: "found X but expected Y or Z" is one of the most useful error 
messages a parser can generate... This would appear to be a point of departure 
in actual functionality compared to the book.



My initial impression of ANTLR: I think as a multi-target system it is still en 
route to where it wants to be so I'll wait on using it like that until... v4.x? 
or maybe when there's a working C++ target. ANTLRWorks is not useful to me 
personally but good clean documentation would be. I look at the ANTLR main site 
and despite it teaming with links it's not very easy to find what you need and 
be sure it's current. There are a few comments on stackoverflow about C target 
docs being kind of "hidden" and I quite agree (e.g. "Interacting with the 
Generated Code" link loops back to the same page: 
http://www.antlr.org/api/C/index.html; or if you already know about "apifuncs" 
you can find this via Google 
http://www.antlr.org/api/C/group__apistructures.html but how to navigate there 
starting from the top if you don't know?).

As another example, here Terence mentions support for immediate left-recursion 
added in v3.4: 
http://stackoverflow.com/questions/212900/advantages-of-antlr-versus-say-lex-yacc-bison.
 Hmm, this seems interesting because it adds some LALR-like capability but I 
see just a single line in the release notes ("Got immediate left-recursion 
working for rules. Added TestLeftRecursion.java"). I try a couple of easy cases 
and they don't work. I look for TestLeftRecursion.java in the fisheye link off 
the main site and can't find it. Eventually find it in the v3.4 source tarball. 
Take the grammar snippet from testSimple() and try it with my v3.4-complete jar 
and it complains about left recursion. Give up at this point.

As a final example, the "don't use inlined tokens" bit of advice turned out to 
be a red herring. Whether to use a named token or a literal is a 
convenience/maintainability feature -- it might impact how the token is 
presented in user-visible messages (it doesn't with JavaCC, for example), but 
it should not impact grammar correctness. My fix (?) turned out to be to move 
from v3.2 to v3.4 (why is v3.3 of C runtime skipped in the list of downloads?) 
so it was clearly not a grammar design issue. In the process of figuring this 
out I discover core code that could never have worked correctly and mapping to 
a feature no longer supported. This from a 5-line grammar test?  Couple such 
experiences with the state of docs and with sometimes rather harsh responses 
from the target maintainer to read the very same and that would be enough to 
put off many would-be users.


On Jul 21, 2011, at 2:37 PM, Jim Idle wrote:

> This was changed because the tool no longer generates those sets.
> 
> Jim
> 
>> -----Original Message-----
>> From: [email protected] [mailto:antlr-interest-
>> [email protected]] On Behalf Of Justin Murray
>> Sent: Thursday, July 21, 2011 12:28 PM
>> To: Vlad
>> Cc: [email protected]
>> Subject: Re: [antlr-interest] 'Dude' error in v3.4 and possible bugs
>> explained [was: on "crap" grammars]
>> 
>> I think that Vlad may be onto something here. From what I can tell from
>> my generated grammar, this only affects ANTLR3_MISMATCHED_SET_EXCEPTION
>> type exceptions. My grammar has several hundred parser rules, but only
>> in 4 cases is a ANTLR3_MISMATCHED_SET_EXCEPTION generated. In all 4
>> cases, the expectingSet is being set to NULL, and in no other cases is
>> expectingSet being set to NULL. I agree that this would be improved if
>> changed as Vlad described.
>> 
>> It just so happens that the way I implemented my exception handling, I
>> treat ANTLR3_MISMATCHED_SET_EXCEPTION the same as
>> ANTLR3_RECOGNITION_EXCEPTION, and don't bother to display the
>> expectingSet, so I never would have discovered this problem.
>> 
>> Since I recently figured out how the C template works, I decided to
>> take a peek. I found the following in antlr-3.4-complete-no-
>> antlrv2.jar/org/antlr/codegen/templates/C/C.stg:
>> 
>> <if(PARSER)>
>> EXCEPTION->expectingSet = NULL;
>> <! use following code to make it recover inline;
>> EXCEPTION->expectingSet = &FOLLOW_set_in_<ruleName><elementIndex>;
>> !>
>> <endif>
>> 
>> So it appears that this was done explicitly at some point. You could
>> edit C.stg to uncomment the code above, and I imagine that it will
>> generate the correct follow set pointer. Perhaps Jim knows why this is
>> like this? This may be avoiding some other problems, so I don't know
>> how safe of a change this would be.
>> 
>> - Justin
>> 
>> On 7/21/2011 2:45 PM, Vlad wrote:
>> 
>>      Previously I was on 3.2 runtime. It occurred to me to try 3.4
>> released a day ago. To this end I've switched to 3.4-beta4 runtime as
>> well. Using one of the testerrors.g grammars with non-inlined int/float
>> tokens and parser generated by antlr-3.4-complete.jar I now get on
>> input string "name : bad":
>> 
>>      <string>(1)  : error 4 : Unexpected token, at offset 6
>>          near [Index: 4 (Start: 31458399-Stop: 31458401) ='bad',
>> type<6> Line: 1 LinePos:6]
>>           : unexpected input...
>>        expected one of : Actually dude, we didn't seem to be expecting
>> anything here, or at least
>>      I could not work out what I was expecting, like so many of us
>> these days!
>> 
>>      (this required switching to antlr3StringStreamNew() from
>> antlr3NewAsciiStringInPlaceStream() as was posted by Jim here:
>> http://groups.google.com/group/il-antlr-
>> interest/browse_thread/thread/981a79239e352c89 and as is mentioned
>> within that thread the last argument can't be NULL to avoid a
>> segfault).
>> 
>>      So, this is better because at least the offending token is
>> identified correctly. The reason the expected set is still not
>> identified correctly (the 'Dude' part) is because the generated error
>> path for the 'type' non-terminal always sets the exception's
>> expectingSet to NULL:
>> 
>>              {
>>                  if ( ((LA(1) >= AT_FLOAT_) && (LA(1) <= AT_INT_)) )
>>                  {
>>                      CONSUME();
>>                      PERRORRECOVERY=ANTLR3_FALSE;
>>                  }
>>                  else
>>                  {
>>                      CONSTRUCTEX();
>>                      EXCEPTION->type         =
>> ANTLR3_MISMATCHED_SET_EXCEPTION;
>>                      EXCEPTION->name         = (void
>> *)ANTLR3_MISMATCHED_SET_NAME;
>>                      EXCEPTION->expectingSet = NULL; // <--- ????
>> 
>>                      goto ruletypeEx;
>>                  }
>> 
>> 
>>              }
>> 
>>      I might be called names again, but I'd say this error handling
>> does not look correct because the rule knows exactly what token set it
>> expects right here but then goes ahead and ignores that info for the
>> purposes of generating exception info (what's the point in indicating
>> ANTLR3_MISMATCHED_SET_NAME if that set is always set to NULL).
>> 
>>      Examining the generated parser code, I in fact see what appears to
>> be a correct set that would be FOLLOW(':'): it has bits set for
>> AT_FLOAT_ and AT_INT_ and is FOLLOWPUSH()ed before the rule is entered.
>> 
>>      By manually doctoring the parser code to set  EXCEPTION-
>>> expectingSet to point to this FOLLOW set, I get rid of the 'Dude'
>> message but hit on another bug in displayRecognitionError() that prints
>> the wrong two token names:
>> 
>>      <string>(1)  : error 4 : Unexpected token, at offset 6
>>          near [Index: 4 (Start: 13845599-Stop: 13845601) ='bad',
>> type<6> Line: 1 LinePos:6]
>>           : unexpected input...
>>        expected one of : <EOR>, <DOWN>
>> 
>>      Looking at the stock displayRecognitionError() code, it is clear
>> that the loop over the set bits is not correct (the TODO is right).
>> Fixing it by adding errBits->isMember(errBits, bit):
>> 
>>      for (bit = 1; bit < numbits && count < 8 && count < size; bit++)
>>      {
>>      // TODO: This doesn;t look right - should be asking if the bit is
>> set!!
>>      //
>>      if  (errBits->isMember(errBits, bit) && tokenNames[bit]) // <---
>> ???? was missing bitset member check
>>      {
>>      ANTLR3_FPRINTF(stderr, "%s%s", count > 0 ? ", " : "",
>> tokenNames[bit]);
>>      count++;
>>      }
>>      }
>> 
>>      finally gets me the error message that makes sense:
>> 
>>      <string>(1)  : error 4 : Unexpected token, at offset 6
>>          near [Index: 4 (Start: 30442591-Stop: 30442593) ='bad',
>> type<6> Line: 1 LinePos:6]
>>           : unexpected input...
>>        expected one of : AT_FLOAT_, AT_INT_
>> 
>> 
>>      "Crap" grammars, I hear somebody said? Hmm, I don't think so...
>> 
>> 
>> 
>> 
>> 
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
>> email-address
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: 
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 33310] [antlr-interest] a project observation [Re: 'Dude' error in v3.4 and possible bugs explained [was: on "crap" grammars]]

Reply via email to