Last year, work was carried out off list and two partial / demo 
implementations, one with JFlex + CUP and one with JavaCC were produced. As 
the two developers that participated had to concentrate on other work, this 
was not progressed since.

My assessment of the whole project can be summarized below:

1. If we want just to improve the tokenizer, JFlex seems to be OK. It seems 
the licnese is not an issue as the generated files can be released under 
BSD. Work done by Loic can be completed and used to replace our existing 
tokenizer fairly easily.

2. If we want to introduce a new parser, a lot more than what the current 
crop of parser generators can do is necessary in order to achieve the 
current level of performance and simplify future development. I won't get 
into the details here but I have previously outlined what has to be done. It 
is a fairly large project to develop a specialized generator / compiler on 
top of CUP.

Fred


----- Original Message ----- 
From: "Blaine Simpson" <[EMAIL PROTECTED]>
To: <hsqldb-developers@lists.sourceforge.net>
Sent: 18 October 2005 15:58
Subject: [Hsqldb-developers] Survey of Scanners and Parsers


I'm finally catching up to our Lexer/Parser thread after a year and a
half.  I made a SQL
interpreter with
lex and yacc about 20 years ago, which was used in a real product.  My
immediate goal is
to figure out which tools we are most likely to use and learn them.
Yesterday I reviewed
the available open source scanners and parsers after reading what
Campbell and others
had to say.  Here are my estimations of the suitability of the available
products for our
use.  In most cases, I don't mention the several products which are no
longer actively
used, which are intentionally limiting (i.e., unsuitable for complex
languages), or which
have non-Java requirements for build or run-time.

Knowing the other important goals for HSQLDB in the short and long term,
I don't
know if we'll get enough people to obligate to see through a rewrite of
the primary
HSQLDB interpreter/parser.  I'm going to learn the tools anyways though,
in part
because I intend to make a PL/SQL engine for HSQLDB, and I don't think that
anything but a trivial subset would be possible without using a real
scanner and/or
parser.


Scanners

    JFlex.  GPL.  I like it.  Input looks very intuitive... at least for
somebody who used
    to use lex.  Documentation and examples look great.  Lots of people
think
    very highly of it.  Change log shows that they took the effort to
follow Java
    conventions (e.g. output class naming, Ant builds).

    JLex.  GPL-compatible.  Not seeing much activity in the past couple
years.
    JFlex is a "rewrite" of
    JLex and can be run in JLex-compatibility mode.  According to the
JFlex authors,
    JFlex can do everything that JLex could do, but faster and better.
Not wanting to
    take the time to run my own comparisons, I'm inclined to believe
that until I hear
    somebody disagree.

    SableCC.  Looks like a good product.  Supports EBNF.  Not as widely
used as
    JFlex, but still actively used.  Input looks similar to lex input.
Compares well
    to Antlr, but can't find comparison to JLex.  Looks like I'd have to
download
    the distro to find out what kind of license it has.

    Coco/R.  "Slightly extended" GPL.  I'm not sure what the input
specification files
    looks like for the Coco/R scanner, but if it is ATG files, they look
much more
    difficult to maintain and much less intuitive than
lex/flex/jlex/jflex input files.
    Looks like docs only availble in PDF (I love HTML docs!), but maybe the
    HTML docs just aren't advertised as well as the PDFs.  The
documentation
    about the Java distro gives me the feeling that good Java design is
not their
    top priority.

    Antlr.  BSD license.  Poor performance.

Parsers

    JavaCC (formerly "Jack").  BSD.  Poor docs.  I haven't seen anybody
compare CUP and
    JavaCC and favor JavaCC.  I saw somebody complain that JavaCC was
commercial, so
    it may have been commercial at one time.

    Jacc.  BSD.  Pure Java.  I don't see much use of it for the past few
years.  I'm concerned
    about support, fixes for newly found bugs, etc.

    CUP.  GPL-compatible.  Very popular.  Highly regarded.  Definitely
works with JFlex.

    Beaver.  BSD.  Not as popular as CUP, but still actively used.
Definitely works with
    JFlex.  Allegedly the fastest performance possible for the class of
parsers that
    we're looking at.  Input spec looks intuitive yet powerful.  Good
Java design.

    SableCC.  Not as widely used as CUP.  I don't know about
compatibility with
    JFlex.  See SableCC listing in Scanner list.

    Coco/R.  See Coco/R listing in Scanner list.

    SableCC.  See SabbleCC listing in Scanner list.


Please reply with recommendations against any of these.  I don't want to
waste my time
learning products that I won't ever use.  I've already started learning
JFlex.   CUP and
Beaver are next on my to-learn list, depending on future discussion and
findings.


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
hsqldb-developers mailing list
hsqldb-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/hsqldb-developers 



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
hsqldb-developers mailing list
hsqldb-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/hsqldb-developers

Reply via email to