[jvm-l] Re: CookCC lexer/parser generator

Rémi Forax Fri, 05 Dec 2008 02:36:52 -0800

coconut a écrit :
> Hi,
>
> On Nov 28, 11:30 am, Rémi Forax <[EMAIL PROTECTED]> wrote:
>   
>> We throw this approach 3 months later because
>>   - it's not really readable thus difficult to debug :)
>>     
> I beg to differ.  I find it much easier to read, write and debug.
>   
I've no problem with mutiplicity :)
But i just think that sharing experience is important too.
>   
>>   - the Java type of each non-terminal must be the same and with this format
>>     that type is repeated on each method that defines a rule containing
>>     the non-terminal.
>>     
> I just don't see why it has to.  I wrote the yacc parser using Java
> annotation approach and it involves a number of different data types
>
> http://code.google.com/p/cookcc/source/browse/trunk/src/org/yuanheng/cookcc/input/yacc/YaccParser.java
>   
Ok, let me try to be more clear.
I've extracted two productions of your example :


        @Rule (lhs = "tokenList", rhs = "tokenList TOKEN", args = "1 2")
        String parseTokenList (String list, String token)
        {
                return list + " " + token;
        }

        @Rule (lhs = "tokenList", rhs = "TOKEN", args = "1")
        String parseTokenList (String token)
        {
                return token;
        }

Here, you have an implicit rule that says tokenList is a String,
but it's up to the grammar writer to enforce that rule.


Moreover, when you write a LR grammar, you have to deal
with LR conflicts. In general, if you find why there is a conflict,
solve it is easy, but find it is sometimes not that easy.

One Tatoo's author use it to teach compiler course
(He used to use Yacc before). He tells me that
students are more able to resolve such conflicts because
Tatoo format provide a clean separation (of concern)
between the grammar part and the semantics part.

So for a grammar writer it's important to see easily all productions
of a non terminal and if a non terminal can derive to empty.
With your format, productions that have the same left non-terminal
can be anywhere so a grammar writer have to scan the whole program to find them.

Example :
        @Rule (lhs = "action", rhs = "complete_action ACTION_CODE", args = "2")
        String parseAction (String action)
        {
                return action;
        }

Here, if i read this method, action can not derive to empty.
I have to find another method too see that action can derive to empty.

        @Rules (rules = {
                @Rule (lhs = "action", rhs = ""),
                @Rule (lhs = "complete_action", rhs = "complete_action 
PARTIAL_ACTION"),
                @Rule (lhs = "complete_action", rhs = "")
        })
        String parseAction ()
        {
                return null;
        }

That why I find your format not grammar writer friendly.

Furthermore, the fact to separate the grammar from the semantics allow you
to use  different languages C, Java, C# etc. to specify the semantics
with only one grammar format.

>   
>>   - there is a third point but i'm not able to recall it now :(>
>> So we decide to write grammars as we used to :)
>> And to link the grammar to the semantics, we add a name at the end of
>> each production
>> (between curly braces) and a way to declare the type of each terminal
>> and non-terminal at one place.
>>
>> Here is an example of a calc 
>> :http://gforgeigm.univ-mlv.fr/scm/viewvc.php/trunk/samples/calc-ast/ca...
>>
>> and it's generate an interface that you can implement to provide the
>> semantics.http://gforgeigm.univ-mlv.fr/scm/viewvc.php/trunk/samples/calc-ast/sr...
>>     
>
> On the contrary, I think your code precisely listed the shortcomings
> CookCC is trying to address
>
> 1. Separate lexer and parser, so users have to hook them up by
> themselves.
>   
Sometimes you only want a lexer or a parser.
The eclipse Java environment (JDT) is a good example of lexers and parsers
used separately or together.

> 2. Function names are specified in the grammar file has to be exactly
> matched against function names in Java files.  This creates a name
> matching issue.  The problem can only be discovered after compiling
> the grammar file (to generate java code), and then compiling the java
> source files together.
>   
Yes, yo ucan have mismatches.
But the generate code generates interfaces, so it's easy to those 
mismatches.
And frankly, this is not a big deal. Now, we all use IDE, so
fixing improper name is really easy.
Futhermore,  I have a prototype of an eclipse plugin that allows
refactoring between the grammar file  and  the  Java  code.
http://gforgeigm.univ-mlv.fr/projects/tatoo-eclipse
(I'm not sure it currently works well because  we recently change the 
Tatoo AST runtime support)

> 3. Several different files.  This creates a file management issue.  At
> least I couldn't tell without looking at build.xml that which files
> are required for the specific lexer / parser.  This is a particularly
> an issue in cases where multiple small grammar parsers are used in a
> project, since java files are batch compiled.
>   
Tatoo ebnf file can embody all specs (lexer, parser, AST etc) in one file
or in multiple depending if you want to specify semantics in the grammar 
file
or not.
By example, the Java 1.0 grammar is here:
http://gforgeigm.univ-mlv.fr/scm/viewvc.php/trunk/samples/java/jls.ebnf?root=tatoo&view=markup
and the semantics to use Tatoo as a javac frontend is here:
http://gforgeigm.univ-mlv.fr/scm/viewvc.php/trunk/samples/java/types.ebnf?root=tatoo&view=markup

In that way, you can reuse a grammar that already exist and tested but 
hooks a different semantics.

> In contrast, with CookCC java annotation input, lexer and parser
> function names can be arbitrary without worrying about spelling
> mistakes.  Lexer and parser are automatically hooked up.  There is
> only a single input file to deal with (and you can have color syntax
> highlighting without any extra efforts).
>   
I've a parser generator, Tatoo ebnf file grammar is written using Tatoo,
so I've already syntax highlighting without any effort :)
(See the eclipse plugin).

...
>   
>> Using annotations instead of a DSL is not in my opinion a good idea,
>> at least until annotations are String based.> Patrick
>>
>> Rémi
>>     
>
> Sorry for being contentious :)
>   
No problem, I've a bulletproof plate armor :)
> Heng Yuan
>   
Rémi Forax

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "JVM 
Languages" group.
To post to this group, send email to jvm-languages@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/jvm-languages?hl=en
-~----------~----~----~----~------~----~------~--~---

[jvm-l] Re: CookCC lexer/parser generator

Reply via email to