Last year, work was carried out off list and two partial / demo implementations, one with JFlex + CUP and one with JavaCC were produced. As the two developers that participated had to concentrate on other work, this was not progressed since.
My assessment of the whole project can be summarized below: 1. If we want just to improve the tokenizer, JFlex seems to be OK. It seems the licnese is not an issue as the generated files can be released under BSD. Work done by Loic can be completed and used to replace our existing tokenizer fairly easily. 2. If we want to introduce a new parser, a lot more than what the current crop of parser generators can do is necessary in order to achieve the current level of performance and simplify future development. I won't get into the details here but I have previously outlined what has to be done. It is a fairly large project to develop a specialized generator / compiler on top of CUP. Fred ----- Original Message ----- From: "Blaine Simpson" <[EMAIL PROTECTED]> To: <hsqldb-developers@lists.sourceforge.net> Sent: 18 October 2005 15:58 Subject: [Hsqldb-developers] Survey of Scanners and Parsers I'm finally catching up to our Lexer/Parser thread after a year and a half. I made a SQL interpreter with lex and yacc about 20 years ago, which was used in a real product. My immediate goal is to figure out which tools we are most likely to use and learn them. Yesterday I reviewed the available open source scanners and parsers after reading what Campbell and others had to say. Here are my estimations of the suitability of the available products for our use. In most cases, I don't mention the several products which are no longer actively used, which are intentionally limiting (i.e., unsuitable for complex languages), or which have non-Java requirements for build or run-time. Knowing the other important goals for HSQLDB in the short and long term, I don't know if we'll get enough people to obligate to see through a rewrite of the primary HSQLDB interpreter/parser. I'm going to learn the tools anyways though, in part because I intend to make a PL/SQL engine for HSQLDB, and I don't think that anything but a trivial subset would be possible without using a real scanner and/or parser. Scanners JFlex. GPL. I like it. Input looks very intuitive... at least for somebody who used to use lex. Documentation and examples look great. Lots of people think very highly of it. Change log shows that they took the effort to follow Java conventions (e.g. output class naming, Ant builds). JLex. GPL-compatible. Not seeing much activity in the past couple years. JFlex is a "rewrite" of JLex and can be run in JLex-compatibility mode. According to the JFlex authors, JFlex can do everything that JLex could do, but faster and better. Not wanting to take the time to run my own comparisons, I'm inclined to believe that until I hear somebody disagree. SableCC. Looks like a good product. Supports EBNF. Not as widely used as JFlex, but still actively used. Input looks similar to lex input. Compares well to Antlr, but can't find comparison to JLex. Looks like I'd have to download the distro to find out what kind of license it has. Coco/R. "Slightly extended" GPL. I'm not sure what the input specification files looks like for the Coco/R scanner, but if it is ATG files, they look much more difficult to maintain and much less intuitive than lex/flex/jlex/jflex input files. Looks like docs only availble in PDF (I love HTML docs!), but maybe the HTML docs just aren't advertised as well as the PDFs. The documentation about the Java distro gives me the feeling that good Java design is not their top priority. Antlr. BSD license. Poor performance. Parsers JavaCC (formerly "Jack"). BSD. Poor docs. I haven't seen anybody compare CUP and JavaCC and favor JavaCC. I saw somebody complain that JavaCC was commercial, so it may have been commercial at one time. Jacc. BSD. Pure Java. I don't see much use of it for the past few years. I'm concerned about support, fixes for newly found bugs, etc. CUP. GPL-compatible. Very popular. Highly regarded. Definitely works with JFlex. Beaver. BSD. Not as popular as CUP, but still actively used. Definitely works with JFlex. Allegedly the fastest performance possible for the class of parsers that we're looking at. Input spec looks intuitive yet powerful. Good Java design. SableCC. Not as widely used as CUP. I don't know about compatibility with JFlex. See SableCC listing in Scanner list. Coco/R. See Coco/R listing in Scanner list. SableCC. See SabbleCC listing in Scanner list. Please reply with recommendations against any of these. I don't want to waste my time learning products that I won't ever use. I've already started learning JFlex. CUP and Beaver are next on my to-learn list, depending on future discussion and findings. ------------------------------------------------------- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl _______________________________________________ hsqldb-developers mailing list hsqldb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/hsqldb-developers ------------------------------------------------------- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl _______________________________________________ hsqldb-developers mailing list hsqldb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/hsqldb-developers