Hi,
I'm doing pretty well recognizing LaTeX commands, but now I'm at the
stage where I want to capture the "text". I'm having trouble defining
"everything else".
Basically, I currently define LaTeX as
commands (as I define them), possibly separated by WS, and everything
that's not a command is "text". I keep running into a problem that when
I define "text" generously, it starts grabbing tokens that belong to
commands. Any help would be greatly appreciated!
Thanks in advance,
Pavel
I'm including what I have so far, and the document I'm hoping to parse.
grammar PGTeX;
doc : (command WS?)+ EOF;
command : escWord cWord+ ( sWord+ cWord*)?;
sWord : '[' word ']';
cWord : '{' word '}';
escWord : '\\' word;
word : WORD;
WORD: ('-'|'a'..'z'|'A'..'Z'|'0'..'9'|'\*')+;
WS : ( ' ' | '\t'| '\r' | '\n' )+;
COMMENT
: '%' (~('\n'|'\r'))* {$channel = HIDDEN;};
And here's the document:
\documentclass{book}%
\usepackage{amsfonts}
\usepackage{amsmath}%
\newtheorem{summary}[theorem]{Summary}
\begin{document}
\chapter*{Intro}
Book starts here $x^{2}+y^{2}=1$. Here's an intersting faction:
\begin{equation}
\int_{0}^{1}\sin xdx=4
\end{equation}
\end{document}
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address
--
You received this message because you are subscribed to the Google Groups
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/il-antlr-interest?hl=en.