Hi, Two more advantages of a hand-written parser:
- You can actually debug the parser. No chance with JavaCC or ANTLR - Better tools support (refactoring, autocomplete) > sorry for my somewhat ironic statement about you being the only one > wanting a hand-written parser, To my surprise, it turns out I was wrong! > Just curious, don't you use use a separate tokenizing step in your > hand-written parsers (I'm asking because of the literal "AND" above)? Lexing (tokenizing, scanning) is done in a lower level. Can be hand-written, or using a tool (for example StringTokenizer, or JFlex). The boundary between tokenizing, lexing and parsing is soft. In my example tokenizing is done in 'read(): a token'. > I usually prefer a separate tokenizing step, if only to make testing > easier. Sure! Not sure how to do that in JavaCC or ANTLR, but it is probably possible as well. > context-sensitive tokenizing I'm not sure what you refer to. Keywords versus identifiers? Example token types are: 'integer value', 'decimal value', 'text value', 'operator', 'quoted identifier', 'name'. The keywords are well defined in Java, but for SQL, I wouldn't decide if it's a keyword or identifier while tokenizing. Remarks are usually silently eaten by the tokenizer (except for @deprecated in Javac). > The final answer to this question is probably "whoever implements it > gets to decide". For me, the easiest way to understand a parser would > be the unit tests which demonstrate its functionality, anyway. I fully agree. Some example parser code: Derby JavaCC source file (313 KB): http://svn.apache.org/repos/asf/db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/sqlgrammar.jj (the generated .java files are 691 + 314 + 20 + 5 = 1030 KB) H2 hand-written parser (161 KB): http://h2database.googlecode.com/svn/trunk/h2/src/main/org/h2/command/Parser.java Thomas
