> > Loring,
Thanks for the feedback. While I now can understand your answer, I still don't have enough experience to implement any approach correctly. As a help to others who are working on this in private or interested in creating a C++ parser I will gloss what Loring said because last week it would have mostly been Greek to me. If I say something wrong, please correct. I am not afraid to make mistakes in public so that others may learn from them. The big thing that one has to learn about parsing C++ is how to handle the ambiguities. That is the dragon to be slayed here. 1. And then you have to figure out how to prune the GLR-generated "forests". GLR http://en.wikipedia.org/wiki/GLR_parser The reason GLR is considered for C++ is that do to the ambiguities of C++ most parsers in a typical fashion have to couple the feedback from the parser to the lexer and reference a symbol table to correctly parse C++. This is contrary to what is taught in school and most of the early examples of parsing you will come across. This allows them to produce an AST without ambiguities after the parser. This is pushing the work to the front. GLR is one of the most powerful parsing strategies and can parse C++ without coupling the parser to the lexer, but does so by generating multiple subtrees for each ambiguity, for which you must later prune out the ambiguities from the forests. This is pushing the work to the back. 2. C++ is nasty; it can be parsed with ANTLR (as shown by NEXT and David Wigg's adaptions of that grammar). The ANTLR C++ grammar does so by creating a symbol table and using predicates during the parse. In my initial conversions of the ANTLR C grammar to C# I was able to convert it, but did not understand why it did certain things at first. After researching it was clear that a lot of the ambiguities of C++ arise when an name is encountered and then it's use ( a type, an identifier, etc) has to be know in order to take the appropriate branch. Thus the creation of the symbol tables and other tables and the use of predicates. This pushes the work to the front. 3. I believe that the right strategy with ANTLR is actually to use multi-pass recognition to sort out the ambiguities. That has not been done yet. The problem is that C++ cannot be adequately described with a context-free grammar; you have to do some context-sensitive processing to resolve the syntax that is semantically ambiguous. It could/should be possible to parse C++ with ANTLR without a symbol table and associated predicates, and then in the AST analysis verify that the input C++ syntax is valid C++ semantics. Remember that the BNF for a grammar does not guarantee valid code/semantics only correct syntax. So the AST is analyzed to find invalid semantics. I am not sure if this will require coupling the parser to the lexer or the AST to the parser, but if I go this route I will soon know. I know this went off topic but was worth it. Any answer to the original question is still appreciated. Thanks Eric List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.