Sigh. 'class' is a keyword in C++, so your first statement is wrong. What you propose is exactly what was done in the NEXT/Wigg parser; the cost is in parser synpreds (sempreds in Wigg's grammars; ANTLR 2 did not hoist synpreds, so Wigg had to work from the PCCTS generated code in getting a workable ANTLR 2 grammar) and tremendous backtracking overhead. I rather suspect that the result is considerably slower than the multi-pass approach would be; the NEXT/Wigg grammars do lots of backtracking as a matter of course. From studying the Wigg grammar, I estimate that there is at least a factor of two overhead from backtracking; tree walkers are faster than parsers (excluding actions) as the result of having a more canonical form (less lookahead; k=1 is normal).
By separating semantic analysis from syntactic analysis, you can support more comprehensive error reporting--multiple errors reported without having to implement parser error recovery--with line/column numbers (stored in the tokens). Being in a rush to report the first error encountered is silly: most of us can afford to wait an extra few milliseconds (or perhaps longer; more comprehensive error reporting costs in terms of display time). Symbol tables are neither magic nor a "one size fits all" technology--there is no one true SymbolTable class. The information stored in symbol table entries varies across languages, and there may be multiple symbol tables for a given language (having a type table separate from the identifier symbol table is common). Folding preprocessing into the lexer can be done; I think Jim Idle did that for C, but I could be mistaken. Whether the added complexity is worth the improved processing time over having a separate preprocessor, I have no idea. Certainly, that is best done as an optimization step; using a separate preprocessor for early development simplifies the problem of finding bugs in the lexer. It is never a good idea to unnecessarily add error sources. So: you are wrong on almost all counts. You need more practical experience with language translation, and a bit of "book learning" would not be out of order. --Loring >________________________________ >From: Douglas Godfrey <douglasgodf...@gmail.com> >To: Loring Craymer <lgcray...@yahoo.com> >Cc: The Researcher <researcher0...@gmail.com>; antlr-interest@antlr.org >Sent: Monday, July 11, 2011 4:33 PM >Subject: Re: [antlr-interest] Can one identify the type of parser needed for a >given BNF grammar > > >If the SymbolTable is not built by the Parser then you cannot generate the >correct AST >for a class definition vs a function call. Also you cannot generate prompt >errors close to >the point where the error occurs in the parse. 90+% of all semantic >ambiguities can be >resolved by a SymbolTable in the Parser. Most C++ programs will not have any >statements >that cannot be resolved inline in the parser and would need to be resolved in >a later AST >TreeParser phase. The 90% case should not pay the compile time cost of a >separate >TreeParser just to resolve ambiguities that can be handled by the Parser. > >No feedback is required for the Lexer. The Lexer can resolve all pre-processor >statements >before the Parser is invoked. The Lexer uses it's own primitive SymbolTable to >handle >text substitutions and macros. Again, 90+% of all C++ programs do not use any >complex >macros where a second lexer pass would be needed. For simple includes and text >substitutions the Lexer can handle them inline. In 10% of the cases The Lexer >needs 2 >passes where the first pass processes all includes, macros and substitutions >and the >second pass generates the token stream for the Parser. > > >On Mon, Jul 11, 2011 at 5:16 PM, Loring Craymer <lgcray...@yahoo.com> wrote: > >On 3.) : The parser just recognizes syntax and ignores semantic ambiguities; >>then in a first tree walker pass, the symbol tables are constructed; a second >>tree walker pass uses the symbol table information to resolve ambiguities >>(which >>of several interpretations of valid syntax is correct) and rewrites the tree >>into an unambiguous form. Then you have finished recognizing C++ and can >>continue with further processing. Never, never, never would I suggest adding >>a >>feedback mechanism that couples parser to lexer. >> >> >>--Loring >> >> >>----- Original Message ---- >>> From: The Researcher <researcher0...@gmail.com> >>> To: antlr-interest@antlr.org >> >>> Sent: Mon, July 11, 2011 12:55:05 PM >>> Subject: Re: [antlr-interest] Can one identify the type of parser needed >>> for a >>>given BNF grammar >>> >>> > >> >>> > Loring, >>> >>> >>> Thanks for the feedback. >>> >>> While I now can understand your answer, I still don't have enough >>> experience to implement any approach correctly. >>> >>> As a help to others who are working on this in private or interested in >>> creating a C++ parser I will gloss what Loring said because last week it >>> would have mostly been Greek to me. If I say something wrong, please >>> correct. I am not afraid to make mistakes in public so that others may >>> learn >>> from them. >>> >>> The big thing that one has to learn about parsing C++ is how to handle the >>> ambiguities. That is the dragon to be slayed here. >>> >>> 1. And then you have to figure out how to prune the GLR-generated >>> "forests". >>> >>> GLR http://en.wikipedia.org/wiki/GLR_parser >>> >>> The reason GLR is considered for C++ is that do to the ambiguities of C++ >>> most parsers in a typical fashion have to couple the feedback from the >>> parser to the lexer and reference a symbol table to correctly parse C++. >>> This is contrary to what is taught in school and most of the early examples >>> of parsing you will come across. This allows them to produce an AST without >>> ambiguities after the parser. This is pushing the work to the front. GLR is >>> one of the most powerful parsing strategies and can parse C++ without >>> coupling the parser to the lexer, but does so by generating multiple >>> subtrees for each ambiguity, for which you must later prune out the >>> ambiguities from the forests. This is pushing the work to the back. >>> >>> 2. C++ is nasty; it can be parsed with ANTLR (as shown by NEXT and David >>> Wigg's >>> adaptions of that grammar). >>> >>> The ANTLR C++ grammar does so by creating a symbol table and using >>> predicates during the parse. In my initial conversions of the ANTLR C >>> grammar to C# I was able to convert it, but did not understand why it did >>> certain things at first. After researching it was clear that a lot of the >>> ambiguities of C++ arise when an name is encountered and then it's use ( a >>> type, an identifier, etc) has to be know in order to take the appropriate >>> branch. Thus the creation of the symbol tables and other tables and the use >>> of predicates. This pushes the work to the front. >>> >>> 3. I believe that the right strategy with ANTLR is actually to use >>> multi-pass recognition to sort out the ambiguities. That has not been >>> done >>> yet. >>> The problem is that C++ cannot be adequately described with a context-free >>> grammar; you have to do some context-sensitive processing to resolve the >>> syntax >>> that is semantically ambiguous. >>> >>> It could/should be possible to parse C++ with ANTLR without a symbol table >>> and associated predicates, and then in the AST analysis verify that the >>> input C++ syntax is valid C++ semantics. >>> Remember that the BNF for a grammar does not guarantee valid code/semantics >>> only correct syntax. So the AST is analyzed to find invalid semantics. I am >>> not sure if this will require coupling the parser to the lexer or the AST >>> to >>> the parser, but if I go this route I will soon know. >>> >>> I know this went off topic but was worth it. >>> >>> Any answer to the original question is still appreciated. >>> >>> Thanks >>> >>> Eric >>> >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest >>> Unsubscribe: >>>http://www.antlr.org/mailman/options/antlr-interest/your-email-address >>> >> >>List: http://www.antlr.org/mailman/listinfo/antlr-interest >>Unsubscribe: >>http://www.antlr.org/mailman/options/antlr-interest/your-email-address >> > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.