The attached grammar illustrates two different patterns that could
work to identify the markers.
However, there is an open question about whether a valid marker can
appear without prefix, suffix, or any escaped characters). Since it is
not clear what would be valid, I have left the grammar as an incomplete
example.
------ Original Message (Sunday, August 15, 2010 10:42:28
AM) From: Joachim Schrod ------
Subject: Re: [antlr-interest] Best practice to handle Lexer backtracking
demand
Gerald Rosenberg writes:
How is XXXX guaranteed to be unambiguous with any other fragment of
aaaaXXXXbbbb? That is, how can you be sure that a fragment like aaaX or
XXXXb will never match a different start marker.
The data generating service guarantees it. (It escapes characters if
any complete marker substring happens to be in the data.)
Is there a case distinction, as implied, or something more
interesting? Is the distinction the same for the end marker?
No and no. The markers are strings like `prenames', `prenamee',
`surnames', etc.
Joachim
--
Gerald B. Rosenberg, Esq.
NewTechLaw
260 Sheridan Ave., Suite 208
Palo Alto, CA 94306-2009
650.325.2100 (office) / 650.703.1724 (cell)
650.325.2107 (facsimile)
www.newtechlaw.com
CONFIDENTIALITY NOTICE: This email message (including any attachments)
is being sent by an attorney,
is for the sole use of the intended recipient, and may contain
confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended
recipient, please contact the sender immediately by reply email and
delete all copies of this message
and any attachments without retaining a copy.
grammar test;
options {
output = AST;
language = Java;
}
tokens {
MWORD;
UWORD;
}
@parser::header {
package test.gen;
}
@lexer::header {
package test.gen;
}
@lexer::members {
List<Token> tokens = new ArrayList<Token>();
@Override
public void emit(Token token) {
super.emit(token);
tokens.add(token);
}
@Override
public Token nextToken() {
super.nextToken();
if (tokens.size() == 0) {
return Token.EOF_TOKEN;
}
return tokens.remove(0);
}
private/* ILexerHelper */Object helper;
public void setHelper(/* ILexerHelper */Object helper) {
this.helper = helper;
}
}
start1
: text1+ EOF
;
start2
: text2+ EOF
;
text1
: MWORD ( w+=WORD | w+=UWORD )+ MWORD
{
// analyze $w
}
| WORD
;
text2 : MARKER ;
WORD
: beg=ESC+ mid=CHAR+ end=ESC+
{
$beg.setType(UWORD);
// $beg.setText(helper.uEsc($beg.getText()));
emit($beg);
$mid.setType(MWORD);
// $mid.setType(helper.determineType($mid));
emit($mid);
$end.setType(UWORD);
// $end.setText(helper.uEsc($end.getText()));
emit($end);
}
| CHAR+
;
MARKER
: ESC*
( // options { k=10; } :
'prenamee'
| 'prenames'
| 'surname'
)
ESC*
;
fragment
ESC
: '\\'
( 'n'
| 'r'
| 't'
| 'b'
| 'f'
| '"'
| '\''
| '\\'
| .
)
;
fragment
CHAR
: 'a'..'z' | 'A'..'Z'
;
WS
: ( ' ' | '\t' | '\r'? '\n' )+ { $channel = HIDDEN; }
;
\a\b\bsurname\d\e\e\e
\F\C\Xprenames\Q\B\Y\U
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address
--
You received this message because you are subscribed to the Google Groups
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/il-antlr-interest?hl=en.