I had to make a couple tweaks to this. Instead of extracting the
contents of the if() statement to a separate template, I turned
matchSetUnchecked into its own, independent template. Then I was able to
add the <label>=input.LA(1); code to it. In other words, matchSet is
unchanged from its original form, and a new rule matchSetUnchecked is
created from a copy of it with the if/else stripped from it (since the
if condition is known to be true).
Also, the condition for using matchSetUnchecked is different from what I
had originally. The previous version caused problems for filter lexers
since they don't check sets before entering a rule. Here is what it
looks like now (sorry if it's a bit difficult to read):
bool tryUnchecked = false;
if ( name == "matchSet" && !string.IsNullOrEmpty(
elementAST.enclosingRuleName ) && char.IsUpper(
elementAST.enclosingRuleName[0] ) )
{
if ( ( elementAST.Parent.Type == ANTLRLexer.ALT &&
elementAST.Parent.Parent.Parent.Type == RULE &&
elementAST.Parent.Parent.ChildCount == 2 )
|| ( elementAST.Parent.Type == ANTLRLexer.NOT &&
elementAST.Parent.Parent.Parent.Parent.Type == RULE &&
elementAST.Parent.Parent.Parent.ChildCount == 2 ) )
{
// single alt at the start of the rule needs to be checked
}
else
{
tryUnchecked = true;
}
}
Sam
-----Original Message-----
From: Terence Parr [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 13, 2008 8:04 PM
To: Sam Harwell
Cc: ANTLR-dev Dev
Subject: Re: [antlr-dev] Potential optimation of set matching in lexers
Looks good. I'll be doing something similar in the Spring when I get
to optimization. :)
Thanks!
Ter
On Nov 13, 2008, at 4:59 PM, Sam Harwell wrote:
> I made a few changes to address an issue I was seeing in the
> generated code. I'm not confident at this point that it can't
> introduce new bugs, so I'd like some of you to take a look and see
> if it could apply to your targets as well. This update saved me tens
> of millions of calls to input.LA(x) while parsing our sources.
>
> If the existing success code in the matchSet StringTemplate is
> extracted to its own StringTemplate, we can optimizing the handling
> of certain set alts. Here is what the StringTemplate code in the C#
> port looks like with the code extracted:
>
> matchSet(s,label,elementIndex,postmatchCode="") ::= <<
> <if(label)>
> <if(LEXER)>
> <label>= input.LA(1);<\n>
> <else>
> <label>=(<labelType>)input.LT(1);<\n>
> <endif>
> <endif>
> if ( <s> )
> {
> <! code originally here moved to matchSetUnchecked !>
> <matchSetUnchecked(...)>
> }
> else
> {
> <ruleBacktrackFailure()>
> MismatchedSetException mse = new
> MismatchedSetException(null,input);
> <@mismatchedSetException()>
> <if(LEXER)>
> recover(mse);
> throw mse;
> <else>
> throw mse;
> <! use following code to make it recover inline; remove
> throw mse;
>
> recoverFromMismatchedSet
> (input,mse,FOLLOW_set_in_<ruleName><elementIndex>);
> !>
> <endif>
> }<\n>
> >>
>
> matchSetUnchecked(s,label,elementIndex,postmatchCode="") ::= <<
> input.consume();
> <postmatchCode>
> <if(!LEXER)>
> state.errorRecovery=false;
> <endif>
> <if(backtracking)>state.failed=false;<endif>
> >>
>
> I then updated the code in the codegen tree walker to directly use
> matchSetUnchecked in alts where the prediction code already
> guarantees success. I also have fallback code should one of the
> targets (or at this point - all of the targets except my C# one) not
> support the optimization.
>
> protected StringTemplate getTokenElementST( String name,
> String elementName,
> GrammarAST elementAST,
> GrammarAST ast_suffix,
> String label )
> {
> // BEGIN PIXELMINE: sets optimization
> bool tryUnchecked = false;
> if ( name == "matchSet" && !
> string.IsNullOrEmpty( elementAST.enclosingRuleName ) &&
> char.IsUpper( elementAST.enclosingRuleName[0] ) )
> {
> tryUnchecked = true;
> }
> // END PIXELMINE
>
> string suffix = getSTSuffix( elementAST, ast_suffix, label );
> // PIXELMINE: I removed the "name += suffix;" line that was here
> // if we're building trees and there is no label, gen a label
> // unless we're in a synpred rule.
> Rule r = grammar.getRule( currentRuleName );
> if ( ( grammar.BuildAST || suffix.length() > 0 ) && label ==
> null &&
> ( r == null || !r.isSynPred ) )
> {
> label = generator.createUniqueLabel( elementName );
> CommonToken labelTok = new CommonToken( ANTLRParser.ID,
> label );
> grammar.defineTokenRefLabel( currentRuleName, labelTok,
> elementAST );
> }
>
> // BEGIN PIXELMINE: sets optimization
> StringTemplate elementST = null;
> if ( tryUnchecked )
> elementST = templates.getInstanceOf( name + "Unchecked" +
> suffix );
> if ( elementST == null )
> elementST = templates.getInstanceOf( name + suffix );
> // END PIXELMINE
>
> if ( label != null )
> {
> elementST.setAttribute( "label", label );
> }
> return elementST;
> }
>
> Sam
> _______________________________________________
> antlr-dev mailing list
> [email protected]
> http://www.antlr.org:8080/mailman/listinfo/antlr-dev
_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org:8080/mailman/listinfo/antlr-dev