[antlr-dev] Every grammar an output grammar

Loring Craymer Wed, 02 Apr 2008 14:50:22 -0700

One of the problems that we hoped that ANTLR 3 would solve was that of text 
output--generating text from ANTLR 2 was an unpleasant experience.  
StringTemplate helps considerably, and one of the inchoate ideas that Ter was 
grappling with at the ANTLR 3 cabal was that of an "output grammar" that would 
provide a mapping between an ANTLR grammar and a template group.  This idea 
never quite gelled; the ANTLR 3 "template" output options was a first attempt 
to go in this direction.


Last week's "template generation" discussion came just as I was finishing a 
pretty printer for Yggdrasil and started me thinking about the problem again.  
I finally came up an approach that seems to provide a solution and have 
implemented that in Yggdrasil.  After trying other syntaxes, I ended up with an 
annotation that "mirrors" the template syntax:  a <foo> suffix assigns a 
token's text to a "foo" attribute, while <<bar>> references a bar template 
(fills a slot that holds slots; i. e:  a template).  This approach could easily 
be incorporated into the baseline ANTLR 3; the whole idea of output grammars 
seems to be a big step forward.  I have documented this for Yggdrasil as 
follows:


Output annotations and automated template
generationIf the grammar
option “buildText” is set to true, Yggdrasil will automatically
output templates according to the model:
String
        templates are decorated as trees mirroring the input elements of the
        grammar.  For each rule in a grammar, there may be a corresponding
        string template definition, either explicitly specified or (by
        default) with the same name as the rule.  Upon entry to a rule, the
        current template value is pushed onto a stack.  If there is a
        template for the rule, then an instance of that template is created
        and current is set with that instance; otherwise, the current
        template remains in effect.
        Unless
        otherwise specified, values are added to the “body” attribute of
        the current template.
        Grammar
        annotations for template building generally take the form of <key>
        or <<template>> suffixes.  A rule defined with the name
        foo<<bar>> has “bar” as the rule template (the
        syntax here is limited to a single argument).  A rule, token, or
        instantiated attribute reference of the form tok<t> assigns
        tok to the t attribute.  “<->” is the template equivalent
        of “!”.  Rule references have the additonal form
        ruleRef<<templateName>> or <key<templateName>>;
        templateName overrides the template that would have otherwise been
        invoked.  If templateName is “-”, then no template is created
        for the invoked rule and text items are added to the current
        template.  [So “<->” is “do not add text” and “<<->>”
        is “do not invoke a new template”.]
        Syntactic
        predicates build templates, and the recognizer class tracks the last
        syntactic predicate. Synpred templates are not added to the output
        template; they are tracked as an aid to debugging.
Within a rule,
[EMAIL PROTECTED]<>” references the active template and can be assigned
to a Text attribute.  Grammar attributes are not added to the output
template except through actions or through the attribute algebra.
Given this scheme,
any grammar can become an output grammar, and a variety of template
groups can be built in order to do such things as build text for
displaying a parse tree, build XML output, pretty print, and the
like.  Parse tree and XML output forms can be built automatically. 
To build a pretty printer or other customized output format, the
natural approach is to start with a parse tree format, then fill out
individual templates and annotate the grammar.  For a rule named
“rule”, one parse tree template is
rule(body) ::= <<
“rule”
        <body;
separator = “\n”>
>>
and this template
can be easily generated for all rules in a grammar.
Example output grammarThis
is a simplified subset of some definitions from antlr.g:


rule<<ruleDef>>
        :
        (
                DOC_COMMENT
        )?
        (
                "protected"
                |
                        "public"
                |
                        "private"
                |
                        "fragment"
        )?
        (
                RULE_REF<name>
                |
                        TOKEN_REF<name>
        )
        COLON
        block
        SEMI
        ;


block
        :
        alternative
        (
OR<-> alternative<<orAlt>> )*
        ;


and
this is the corresponding template set for a pretty printer:


ruleDef(name,
body) ::= <<
<name>
        :
        <body;
separator = "\n">
        ;
>>


block(body,
suffix) ::= <<
(
        <body;
separator = "\n">
)<suffix>
>>


alternative(body)
::= <<
<body;
separator = "\n">
>>


orAlt(body)
::= <<
|
        <body;
separator = "\n">
>>



Note that a fairly minimal annotation of the .g file maps the
input onto a rich template set; that is, the development effort for
producing a pretty printer is focused primarily on building
templates, not on annotating the grammar.





      
____________________________________________________________________________________
You rock. That's why Blockbuster's offering you one month of Blockbuster Total 
Access, No Cost.  
http://tc.deals.yahoo.com/tc/blockbuster/text5.com

_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org:8080/mailman/listinfo/antlr-dev

[antlr-dev] Every grammar an output grammar

Reply via email to