[il-antlr-interest: 33130] Re: [antlr-interest] Can one identify the type of parser needed for a given BNF grammar

Loring Craymer Mon, 11 Jul 2011 20:59:56 -0700

Sigh.  'class' is a keyword in C++, so your first statement is wrong.  What you 
propose is exactly what was done in the NEXT/Wigg parser; the cost is in parser 
synpreds (sempreds in Wigg's grammars; ANTLR 2 did not hoist synpreds, so Wigg 
had to work from the PCCTS generated code in getting a workable ANTLR 2 
grammar) and tremendous backtracking overhead.  I rather suspect that the 
result is considerably slower than the multi-pass approach would be; the 
NEXT/Wigg grammars do lots of backtracking as a matter of course.  From 
studying the Wigg grammar, I estimate that there is at least a factor of two 
overhead from backtracking; tree walkers are faster than parsers (excluding 
actions) as the result of having a more canonical form (less lookahead; k=1 is 
normal).


By separating semantic analysis from syntactic analysis, you can support more 
comprehensive error reporting--multiple errors reported without having to 
implement parser error recovery--with line/column numbers (stored in the 
tokens).  Being in a rush to report the first error encountered is silly:  most 
of us can afford to wait an extra few milliseconds (or perhaps longer; more 
comprehensive error reporting costs in terms of display time).

Symbol tables are neither magic nor a "one size fits all" technology--there is 
no one true SymbolTable class.  The information stored in symbol table entries 
varies across languages, and there may be multiple symbol tables for a given 
language (having a type table separate from the identifier symbol table is 
common).

Folding preprocessing into the lexer can be done; I think Jim Idle did that for 
C, but I could be mistaken.  Whether the added complexity is worth the improved 
processing time over having a separate preprocessor, I have no idea.  
Certainly, that is best done as an optimization step; using a separate 
preprocessor for early development simplifies the problem of finding bugs in 
the lexer.  It is never a good idea to unnecessarily add error sources.

So:  you are wrong on almost all counts.  You need more practical experience 
with language translation, and a bit of "book learning" would not be out of 
order.

--Loring




>________________________________
>From: Douglas Godfrey <douglasgodf...@gmail.com>
>To: Loring Craymer <lgcray...@yahoo.com>
>Cc: The Researcher <researcher0...@gmail.com>; antlr-interest@antlr.org
>Sent: Monday, July 11, 2011 4:33 PM
>Subject: Re: [antlr-interest] Can one identify the type of parser needed for a 
>given BNF grammar
>
>
>If the SymbolTable is not built by the Parser then you cannot generate the 
>correct AST
>for a class definition vs a function call. Also you cannot generate prompt 
>errors close to
>the point where the error occurs in the parse. 90+% of all semantic 
>ambiguities can be 
>resolved by a SymbolTable in the Parser. Most C++ programs will not have any 
>statements 
>that cannot be resolved inline in the parser and would need to be resolved in 
>a later AST
>TreeParser phase. The 90% case should not pay the compile time cost of a 
>separate 
>TreeParser just to resolve ambiguities that can be handled by the Parser.
>
>No feedback is required for the Lexer. The Lexer can resolve all pre-processor 
>statements
>before the Parser is invoked. The Lexer uses it's own primitive SymbolTable to 
>handle 
>text substitutions and macros. Again, 90+% of all C++ programs do not use any 
>complex
>macros where a second lexer pass would be needed. For simple includes and text 
>substitutions the Lexer can handle them inline. In 10% of the cases The Lexer 
>needs 2 
>passes where the first pass processes all includes, macros and substitutions 
>and the 
>second pass generates the token stream for the Parser.  
>
>
>On Mon, Jul 11, 2011 at 5:16 PM, Loring Craymer <lgcray...@yahoo.com> wrote:
>
>On 3.) :  The parser just recognizes syntax and ignores semantic ambiguities;
>>then in a first tree walker pass, the symbol tables are constructed; a second
>>tree walker pass uses the symbol table information to resolve ambiguities 
>>(which
>>of several interpretations of valid syntax is correct) and rewrites the tree
>>into an unambiguous form.  Then you have finished recognizing C++ and can
>>continue with further processing.  Never, never, never would I suggest adding 
>>a
>>feedback mechanism that couples parser to lexer.
>>
>>
>>--Loring
>>
>>
>>----- Original Message ----
>>> From: The Researcher <researcher0...@gmail.com>
>>> To: antlr-interest@antlr.org
>>
>>> Sent: Mon, July 11, 2011 12:55:05 PM
>>> Subject: Re: [antlr-interest] Can one identify the type of parser needed 
>>> for a
>>>given BNF grammar
>>>
>>> >
>>
>>> > Loring,
>>>
>>>
>>> Thanks for the feedback.
>>>
>>> While I now can  understand your answer, I still don't have enough
>>> experience  to  implement any approach correctly.
>>>
>>> As a help to others who are working on  this in private or interested in
>>> creating a C++  parser I will gloss  what Loring said because last week it
>>> would have mostly been Greek to me. If  I say something wrong, please
>>> correct. I am not afraid to make mistakes in  public so that others may 
>>> learn
>>> from them.
>>>
>>> The big thing that one has  to learn about parsing C++ is how to handle the
>>> ambiguities. That is the  dragon to be slayed here.
>>>
>>> 1. And then you have to figure out how to prune  the GLR-generated 
>>> "forests".
>>>
>>> GLR http://en.wikipedia.org/wiki/GLR_parser
>>>
>>> The reason GLR is considered  for C++ is that do to the ambiguities of C++
>>> most parsers in a typical  fashion have to couple the feedback from the
>>> parser to the lexer and  reference a symbol table to correctly parse C++.
>>> This is contrary to what is  taught in school and most of the early examples
>>> of parsing you will come  across. This allows them to produce an AST without
>>> ambiguities after the  parser. This is pushing the work to the front. GLR is
>>> one of the most  powerful parsing strategies and can parse C++ without
>>> coupling the parser to  the lexer, but does so by generating multiple
>>> subtrees for each ambiguity,  for which you must later prune out the
>>> ambiguities from the forests. This is  pushing the work to the back.
>>>
>>> 2. C++ is nasty; it can be parsed with  ANTLR (as shown by NEXT and David
>>> Wigg's
>>> adaptions of that  grammar).
>>>
>>> The ANTLR C++ grammar does so by creating a symbol table and  using
>>> predicates during the parse. In my initial conversions of the ANTLR  C
>>> grammar to C# I was able to convert it, but did not understand why it  did
>>> certain things at first. After researching it was clear that a lot of  the
>>> ambiguities of C++ arise when an name is encountered and then it's use (  a
>>> type, an identifier, etc)  has to be know in order to take the  appropriate
>>> branch. Thus the creation of the symbol tables and other tables  and the use
>>> of predicates. This pushes the work to the front.
>>>
>>> 3. I  believe that the right strategy with ANTLR is actually to use
>>> multi-pass  recognition to sort out the ambiguities.  That has not been  
>>> done
>>> yet.
>>> The problem is that C++ cannot be adequately described with a  context-free
>>> grammar; you have to do some context-sensitive processing to  resolve the
>>> syntax
>>> that is semantically ambiguous.
>>>
>>> It could/should  be possible to parse C++ with ANTLR without a symbol table
>>> and associated  predicates, and then in the AST analysis verify that the
>>> input C++ syntax is  valid C++ semantics.
>>> Remember that the BNF for a grammar does not guarantee  valid code/semantics
>>> only correct syntax. So the AST is analyzed to find  invalid semantics. I am
>>> not sure if this will require coupling the parser to  the lexer or the AST 
>>> to
>>> the parser, but if I go this route I will soon  know.
>>>
>>> I know this went off topic but was worth it.
>>>
>>> Any answer to  the original question is still  appreciated.
>>>
>>> Thanks
>>>
>>> Eric
>>>
>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>> Unsubscribe:
>>>http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>
>>
>>List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>Unsubscribe: 
>>http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>
>
>

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 33130] Re: [antlr-interest] Can one identify the type of parser needed for a given BNF grammar

Reply via email to