[jvm-l] Re: CookCC lexer/parser generator

coconut Fri, 05 Dec 2008 09:25:44 -0800

Hi,

On Dec 5, 2:33 am, Rémi Forax <[EMAIL PROTECTED]> wrote:
> Here, you have an implicit rule that says tokenList is a String,
> but it's up to the grammar writer to enforce that rule.


Interesting point.  However, the parser generator can certainly track
this information and thus generate warning / error for this issue.
Then it becomes a non-issue.  For example, we can track if the return
data type (except void) are the same for function calls with the same
LHS non-terminal.  Then use this information to check the argument
parsing for the usage of non-terminals elsewhere.

Java APT is quite nice in that it provides such information.  I will
probably add this check in the next release :-D

> Moreover, when you write a LR grammar, you have to deal
> with LR conflicts. In general, if you find why there is a conflict,
> solve it is easy, but find it is sometimes not that easy.
>
> One Tatoo's author use it to teach compiler course
> (He used to use Yacc before). He tells me that
> students are more able to resolve such conflicts because
> Tatoo format provide a clean separation (of concern)
> between the grammar part and the semantics part.
>
> So for a grammar writer it's important to see easily all productions
> of a non terminal and if a non terminal can derive to empty.
> With your format, productions that have the same left non-terminal
> can be anywhere so a grammar writer have to scan the whole program to find 
> them.

I honestly just don't see why it is a problem.  CookCC supports
several input AND output file formats.  Thus, you can convert Java
input into XML or yacc grammar.  Also grammar analysis list the
grammar as well (which would be necessary to check why conflicts are
generated anyways).  In particular, I choose to write the grammar that
way as a shorthand.

> Furthermore, the fact to separate the grammar from the semantics allow you
> to use  different languages C, Java, C# etc. to specify the semantics
> with only one grammar format.

True, but Java developers don't care about C / C# at all.  They only
care about Java.  So this is a useless feature to them.
Besides,annotation input approach can easily be extended to other
languages if there are language specific comment + protocol parsers.
For instance, SPARK uses Python doc string.  Java annotation initially
also started out as JavaDoc comment tags.

It is true that writing a good comment + protocol parser is non-
trivial task, but I think that it is easier than writing language
specific plugins (mentioned below).  CookCC does have Xml and Yacc
input as a backup approach inputting grammar.

> > On the contrary, I think your code precisely listed the shortcomings
> > CookCC is trying to address
>
> > 1. Separate lexer and parser, so users have to hook them up by
> > themselves.
>
> Sometimes you only want a lexer or a parser.
> The eclipse Java environment (JDT) is a good example of lexers and parsers
> used separately or together.

CookCC does allow lexer only or parser only things (it detects them).
If both lexer and parser is present, they are automatically hooked up.

> > 2. Function names are specified in the grammar file has to be exactly
> > matched against function names in Java files.  This creates a name
> > matching issue.  The problem can only be discovered after compiling
> > the grammar file (to generate java code), and then compiling the java
> > source files together.
>
> Yes, yo ucan have mismatches.
> But the generate code generates interfaces, so it's easy to those
> mismatches.
> And frankly, this is not a big deal. Now, we all use IDE, so
> fixing improper name is really easy.
> Futhermore,  I have a prototype of an eclipse plugin that allows
> refactoring between the grammar file  and  the  Java  
> code.http://gforgeigm.univ-mlv.fr/projects/tatoo-eclipse
> (I'm not sure it currently works well because  we recently change the
> Tatoo AST runtime support)

Actually, this is somewhat a big deal.  As a C/C++ developer, one
thing I find annoying is matching protocol decl with actual
implementation.  Without refactoring tools, I would avoid name /
protocol changing as much as possible even when the functions haven't
been used elsewhere.  In contrast, Java has no such pain and is a
reason why I love Java so much :)

Likewise, one often heard complaint in dealing with XML based data
binding / code-injection (e.g. Spring framework / XUL toolkit etc) are
the mismatches between names specified in XML and actual Java class /
function names.  Even though IntelliJ IDEA provides such safe search
before refactoring (Eclipse doesn't do so for XML), it is not fool
proof as pure Java function hook ups.

The issue here is similar.  Certainly it is possible to deal with it,
but it can get annoying sometimes.

> > 3. Several different files.  This creates a file management issue.  At
> > least I couldn't tell without looking at build.xml that which files
> > are required for the specific lexer / parser.  This is a particularly
> > an issue in cases where multiple small grammar parsers are used in a
> > project, since java files are batch compiled.
>
> Tatoo ebnf file can embody all specs (lexer, parser, AST etc) in one file
> or in multiple depending if you want to specify semantics in the grammar
> file
> or not.
> By example, the Java 1.0 grammar is 
> here:http://gforgeigm.univ-mlv.fr/scm/viewvc.php/trunk/samples/java/jls.eb...
> and the semantics to use Tatoo as a javac frontend is 
> here:http://gforgeigm.univ-mlv.fr/scm/viewvc.php/trunk/samples/java/types....
>
> In that way, you can reuse a grammar that already exist and tested but
> hooks a different semantics.

Interesting, but you can do that with Java annotation approach too.
For example:

@CookCCOption
class JavaParser extends Parser { ... }

Which JavaParser contains mostly blank / abstract functions for parser
part.  Then we can have actual handling code implemented in child
classes.

class JavaByteCodeCompiler extends JavaParser { ... }
class JavaAstGenerator extends JavaParser { ... }

IMO, an AST generator (which I am still doing some research to find
what is a good way) is probably good enough for language parsers, so
such approach is probably not that necessary.

> > In contrast, with CookCC java annotation input, lexer and parser
> > function names can be arbitrary without worrying about spelling
> > mistakes.  Lexer and parser are automatically hooked up.  There is
> > only a single input file to deal with (and you can have color syntax
> > highlighting without any extra efforts).
>
> I've a parser generator, Tatoo ebnf file grammar is written using Tatoo,
> so I've already syntax highlighting without any effort :)
> (See the eclipse plugin).

I didn't have to write any plugins :-D  Besides, does your plugin work
for C# / Python / C++ too (assuming that you are putting everything in
a single file)?

It is far easier to write a comment + prototype parser than to create
a very intelligent plugin on the par of good language editors.

>
> ...
>
> >> Using annotations instead of a DSL is not in my opinion a good idea,
> >> at least until annotations are String based.> Patrick
>
> >> Rémi
>
> > Sorry for being contentious :)
>
> No problem, I've a bulletproof plate armor :)> Heng Yuan
>
> Rémi Forax

Thanks for the discussion :)  You've got some interesting points I've
never though about.

Heng Yuan

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "JVM 
Languages" group.
To post to this group, send email to jvm-languages@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/jvm-languages?hl=en
-~----------~----~----~----~------~----~------~--~---

[jvm-l] Re: CookCC lexer/parser generator

Reply via email to