On 2010 Oct 27, at 13:26, John B. Brodie wrote:

> On Tue, 2010-10-26 at 20:07 -0700, Trevor John Thompson wrote:
>> Greetings.
>> I continue to wrestle with rewrite rules for AST construction. I am trying 
>> to treat semicolon and newline as equivalent separators, and gather a 
>> sequence of expressions as children of a single AST node. The grammar looks 
>> like
>> =======
>> grammar Test;
>> options {output=AST;}
>> prog:        expr EOF!;
>> expr:        (term->term) (((NL|SC) term)+ -> ^(NL $expr term+))?;
>> term:        ID
>>      |       ->ID    // empty treated as no-name ID
>>      ;
>> fragment
>> SP   :       ' '|'\t';
>> SC   :       ';';
>> ID   :       SP*
>>              ('a'..'z'|'A'..'Z'|'_')
>>              ('0'..'9'|'a'..'z'|'A'..'Z'|'_')*
>>      ;
>> NL   :       ('\r'|'\n')+;
>> =======
>> The problem is that if the sequence does *not* include newline, then i get 
>> RewriteEmptyStreamException on the NL in the rewrite rule; i.e. "a;\n" 
>> works, but "a;" does not.
>> 
>> What particularly baffles me is that if i build the node with any token 
>> other than NL or SC (e.g. SP), then the rule *always* works.
>> 
>> Could someone please explain what is going on?
> 
> ANTLR will create a root token when that token does not appear on the
> left hand side of the rewrite operator (the ->). this is known as an
> `imaginary token`. imaginary tokens do not appear in the input token
> stream.
> 
> But any token that appears on both sides of the -> must be present in
> the input token stream as you have encountered.
> 
> So you want to create a NL token as the root, even tho it does not
> appear in the input token stream - but might. therefore:
> 
> expr:   term (((x=NL|x=SC) term)+ -> ^(NL[$x] term+))?;
> 
> the [...] stuff on the right hand side of the rewrite tells ANTLR to
> always construct a new imaginary token that is derived from a real
> token. the stuff inside the [] tells ANTLR how to initialize the
> imaginary token. so in the above case "a;" will end up with a tree whose
> root is actually a NL token.type but with a token.text of ";" and
> position information of the SC.
> 
> SP as root node worked because it did not appear on the left hand side
> of the rewrite so ANTLR just knew you wanted to construct an imaginary
> token (but with no text or position information initialized).
> 
> you really want to use the [] form of token construction so that the
> position information will get set so that later error messages will be
> (hopefully) more meaningful.
> 
> overriding the text of the NL to be ";" is, to me, rather unexpected. so
> i would suggest 
> 
> expr : term ((x=NL|x=SC) term)+ -> ^(EXPR_LIST[$x,"EXPR_LIST"] term+) ;
> 
> where EXPR_LIST is an imaginary token type that you have specified in
> the tokens{} section of your grammar.
> 
> hope this helps...
>   -jbb


Thank you very much for your clear, detailed, and thoughtful explanation.
I appreciate now that tree rewriting is not a matter of arbitrarily tossing 
around nodes; ANTLR carefully maintains internal information to assist the 
parser in generating good diagnostics. I may be catching on. . .

TJ
--
Trevor John Thompson    net: [email protected]

Quidquid Latine dictum sit, altum videtur.


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

Reply via email to