Hi All, I updated the PHP target to work with Antlr 3.4/PHP 5.3. This code is available at http://domemtech.com/code/antlrphpruntime.zip for the next month or so, until it—hopefully—finds a permanent location. I plan on making more changes when I start rewriting the runtime tests for the target and figuring out what in the world is going on with this target.
NOTE: Someone needs to take control of this mess, delete the many forked copies of this target, and put this in one official location. The development of this target is absolutely atrocious. This code is not in one repository, but at least four. I really do not understand why people cannot make private repositories on their machines instead of proliferating multiple public repositories. It is not easy figuring out who made what change when, why, and are those changes useful. There may be more forked copies of the PHP runtime out in the wild, but who knows. For better or worse, I chose code base #3 listed below for development, and made a copy of that onto my machine. The reason I chose that code base was because the author sent a cogent email explaining his changes, and because it was changed more recently than any of the other code bases. WHERE IS ANTLR PHP LOCATED? Here are the four different repositories: (1) http://antlrphpruntime.googlecode.com (http://code.google.com/p/antlrphpruntime/ ) – SVN. This code is officially anointed in the Antlr targets web page http://www.antlr.org/wiki/display/ANTLR3/Code+Generation+Targets as “the one and only PHP target”. It isn’t clear what Antlr version or PHP version this code targets. This code was last changed on June 19, 2010 (code) by Eugeny Yakimovitch. Several other unimportant changes were made more recently (e.g., June 26, 2011). (2) https://github.com/rollxx/antlr-php-runtime – GIT This code was probably forked from (1), but since there are no embedded version ids in the source code, I can’t tell you what was done. The code was last changed March 21, 2010 by rollex. At the top of the page, the author says: “This version in not maintained. Please visit the main project page listed below for the current version “, and gives a link to (1). Unfortunately, it’s hard to say whether the changes were successfully merged back into (1), but there are check ins in late March by rollex to (1). (3) https://github.com/beberlei/antlr-php-runtime – GIT. Benjamin Eberlei noted in an email to the “Antlr Interest” and “Antlr dev” lists (http://markmail.org/message/zbdc2ni3mfjioens#query:+page:1+mid:zbdc2ni3mfjioens+state:results http://www.antlr.org/pipermail/antlr-interest/2010-September/039653.html http://markmail.org/message/v7wq2a6wvsjlwl4n ) that development of the source code in (1) was halted since Feb 2010. Eberlei modified this code to fix several bugs and improve on the quality of the code and checked it in. This repository was forked from (2) (unclear when), and last modified in September 2010 by beberbei. (4) http://code.google.com/p/phpandallthat/ – SVN. Eugeny.Yakimovitch, who is on the list of developers for (1), has an unknown fork of (1) that is yet another implementation of the PHP runtime. The latest changes to that source code was in September 10, 2010. Great! NOTE: As far as I know, there is no PHP target listed in the Fisheye view of the Antlr repository (linked via http://antlr.org). DISCUSSIONS ON THE PHP TARGET: * Aug 30, 2011 http://markmail.org/message/73fo5jg5a36qhv5p * May 30, 2011 http://www.antlr.org/pipermail/antlr-interest/2011-May/041725.html * Sep 6/8, 2010 http://markmail.org/message/zbdc2ni3mfjioens http://markmail.org/message/v7wq2a6wvsjlwl4n * May 6, 2010 http://www.antlr.org/pipermail/antlr-dev/2009-May/002292.html * Oct 9, 2009 http://markmail.org/message/ewmppl7u4b3jnwgh WHAT CHANGES DID I MAKE TO ANTLR PHP? Most of my changes are in Php.stg, to move it forward to Antlr 3.4, and to handle lexers with semantic rules, like this grammar: lexer grammar BigParLexer; options { backtrack = true; filter = true; } @members{ int open = 0; } P @init{open = 1;} : '/*' ( {open > 0}?=> // keep reapeating `( ... )*` as long as open > 0 ( ( { !((input.LA(1) == '/' && input.LA(2) == '*') || (input.LA(1) == '*' && input.LA(2) == '/')) }?=> . ) // match anything other than delimiters. | '/*' {open++;} | '*/' {open--;} ) )* ; The lexer for this grammar accepts input like ‘/* hi /* there */ */’ as one token. NB: this grammar doesn’t work exactly as written for the PHP target, as I explain below. * Rolled changes from Java.stg, Revision ID: 8204, into Php.stg. The link to the code for Java.stg used in the modification of Php.stg is: https://fisheye2.atlassian.com/browse/antlr/tool/src/main/resources/org/antlr/codegen/templates/Java/Java.stg * Fixed problems with backtracking. * Fixed missing $input declaration for semantic predicates. * Fixed missing ‘$’ for ‘alt...’ state variables in DFA generated code. * Added a makefile to constuct antlr.jar. I could not find any “build.xml” file anywhere. And, I cannot stand Ant. WHAT DOES NOT WORK? Not all the tests in .../runtime/Php/test/Antlr/Tests work. Many of these are terrible test cases, some of which cause the Antlr tool to output warnings, and others that crash the tool altogether. I don't know the status of AST construction, tree parsing, etc. There is code for tree construction, but I haven't tested it. WHAT DON'T I LIKE ABOUT THE PHP TARGET? * PHP does not automatically convert an integer into a string and vice versa for tests; variables must be preceded with “$”; and “?>” ends PHP code even in a comment. Input streams in Antlr are composed of integers, not characters. “input->LA()” returns a number. When you want to test the lookahead in a semantic predicate, you must convert the character you are testing into a number, or convert LA() into a string. So, in the above grammar BigParLexer, “input.LA(1) == ‘/’” won’t work—and PHP won’t complain! It must be converted to a target-specific syntax, e.g., “\$input->LA(1) == 47”. * In the wisdom of the developers of PHP, “?>” ends the PHP code section even if it is on a comment line. e.g., “// you are screwed ?> boo hoo.” Consequently, some of the templates in Php.stg are missing code to generate descriptions in comments. If the grammar contains “?>”, as in some of the test cases, PHP will barf on the generated code. There must be a way to convert the description into a PHP safe format, but I don’t know what that would be. * THERE IS NO DOCUMENTATION! WHAT DO I LIKE ABOUT THE PHP TARGET? PHP does not have a “64K byte code per method limit” as in Java. When writing a lexer grammar with semantic predicates, it seems extremely easy to generate Java code that will not compile (e.g., BitParLexer.g but with delimiters with more characters, e.g., “<script> .... </script>”. But, PHP works! Ken Domino List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
