Hi ANTLR list,
I'm using ANTLR within my master's thesis in order to parse the RTF file
format. Everything is fine so far, I can parse simple RTF documents. The
next step is to generate an AST. Here is where the problems start, it
seems as if I have an error in one of my rewrite-rules. If you look at
the grammar, what I want to do, is nest the groups. This means, a
"group" is child of a "group" if the subrule group in the rule group is
called: sg=group space* -> ^(GROUP $sg+). Maybe you try the grammar
using a simple RTF snippet, e.g.
{\pard\fs32\b NOTES\par}
{\pard\fs26 Recently I skimmed {\i Structure and Interpretation of
Computer Programs}, by Sussman and Abelson, and I think there should
have been more pictures.
\line I like pictures. Is that so na\'efve?
\par}
from: http://search.cpan.org/~sburke/RTF-Writer/lib/RTF/Cookbook.pod
Do you have any comments or ideas what I'm doing wrong?
Best regards, Yves
parser grammar RtfParser;
options {
language = Java;
tokenVocab = RtfLexer;
output = AST;
}
@header {
package org.moflon.moca.rtf.parser;
}
prog : (space | v=group)+ -> ^(PROG $v+);
plaintext : TEXT | AZ | HYPHEN | UNDERSCORE | TILDE | APOSTROPHE |
INT ;
command : BACKSLASH n=AZ v=INT? -> ^(COMMAND $n $v?) ;
escape : (BACKSLASH (
v=OPEN -> ^(ESCAPE $v) |
v=CLOSE -> ^(ESCAPE $v) |
v=BACKSLASH -> ^(ESCAPE $v) |
v=HYPHEN -> ^(ESCAPE $v) |
v=UNDERSCORE -> ^(ESCAPE $v) |
v=TILDE -> ^(ESCAPE $v)) |
v=HEXESCAPE -> ^(ESCAPE $v)) ;
space : v=WS -> ^(WS[v, "WS"] $v) |
v=TAB -> ^(TAB[v, "TAB"] $v) |
NEWLINE -> ;
group : (OPEN
space*
(
sg=group space* -> ^(GROUP $sg+) |
p=plaintext space* |
((c1=command)+ space) |
(e=escape) space*
)+
((c2=command)+)?
CLOSE) -> ^(GROUP $p+ $c1+ $e+ $c2+);lexer grammar RtfLexer;
tokens {
PROG;
COMMAND;
ESCAPE;
TEXT;
GROUP;
WS;
TAB;
}
@header {
package org.moflon.moca.rtf.parser;
}
WS : ' '+ ;
TAB : '\t'+ ;
NEWLINE : ('\n' | '\r')+ {skip();} ;
fragment DIGIT : '0'..'9' ;
INT : '-' DIGIT+ | DIGIT+;
HEXESCAPE : BACKSLASH APOSTROPHE (AF | DIGIT) (AF | DIGIT);
BACKSLASH : '\u005C' ;
APOSTROPHE : '\u0027' ;
TILDE : '~' ;
HYPHEN : '-' ;
UNDERSCORE : '_' ;
OPEN : '{' ;
CLOSE : '}' ;
fragment AF : 'a'..'f' | 'A'..'F' ;
fragment GZ : 'g'..'z' | 'G'..'Z' ;
AZ : (AF | GZ)+ ;
TEXT : ('\u0021'..'\u0026' /* skipping ' */ |
'\u0028'..'\u002C' /* skipping - */ |
'\u002E'..'\u002F' /* skipping 0-9 */ |
'\u003A'..'\u0040' /* skipping A-Z */ |
'\u005B' /* skipping \ */ |
'\u005D'..'\u005E' /* skipping _ */ |
'\u0060' /* skipping a-z */ /* skipping { */ |
'\u007C' /* skipping } */ /* skipping ~ */ |
'\u007F'..'\uFFFF') ;
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address
--
You received this message because you are subscribed to the Google Groups
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/il-antlr-interest?hl=en.