I had to make a couple tweaks to this. Instead of extracting the
contents of the if() statement to a separate template, I turned
matchSetUnchecked into its own, independent template. Then I was able to
add the <label>=input.LA(1); code to it. In other words, matchSet is
unchanged from its original form, and a new rule matchSetUnchecked is
created from a copy of it with the if/else stripped from it (since the
if condition is known to be true).

Also, the condition for using matchSetUnchecked is different from what I
had originally. The previous version caused problems for filter lexers
since they don't check sets before entering a rule. Here is what it
looks like now (sorry if it's a bit difficult to read):

bool tryUnchecked = false;
if ( name == "matchSet" && !string.IsNullOrEmpty(
elementAST.enclosingRuleName ) && char.IsUpper(
elementAST.enclosingRuleName[0] ) )
{
    if ( ( elementAST.Parent.Type == ANTLRLexer.ALT &&
elementAST.Parent.Parent.Parent.Type == RULE &&
elementAST.Parent.Parent.ChildCount == 2 )
        || ( elementAST.Parent.Type == ANTLRLexer.NOT &&
elementAST.Parent.Parent.Parent.Parent.Type == RULE &&
elementAST.Parent.Parent.Parent.ChildCount == 2 ) )
    {
        // single alt at the start of the rule needs to be checked
    }
    else
    {
        tryUnchecked = true;
    }
}

Sam

-----Original Message-----
From: Terence Parr [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 13, 2008 8:04 PM
To: Sam Harwell
Cc: ANTLR-dev Dev
Subject: Re: [antlr-dev] Potential optimation of set matching in lexers

Looks good.  I'll be doing something similar in the Spring when I get  
to optimization. :)
Thanks!
Ter
On Nov 13, 2008, at 4:59 PM, Sam Harwell wrote:

> I made a few changes to address an issue I was seeing in the  
> generated code. I'm not confident at this point that it can't  
> introduce new bugs, so I'd like some of you to take a look and see  
> if it could apply to your targets as well. This update saved me tens  
> of millions of calls to input.LA(x) while parsing our sources.
>
> If the existing success code in the matchSet StringTemplate is  
> extracted to its own StringTemplate, we can optimizing the handling  
> of certain set alts. Here is what the StringTemplate code in the C#  
> port looks like with the code extracted:
>
> matchSet(s,label,elementIndex,postmatchCode="") ::= <<
> <if(label)>
> <if(LEXER)>
> <label>= input.LA(1);<\n>
> <else>
> <label>=(<labelType>)input.LT(1);<\n>
> <endif>
> <endif>
> if ( <s> )
> {
>         <! code originally here moved to matchSetUnchecked !>
>         <matchSetUnchecked(...)>
> }
> else
> {
>         <ruleBacktrackFailure()>
>         MismatchedSetException mse = new  
> MismatchedSetException(null,input);
>         <@mismatchedSetException()>
> <if(LEXER)>
>         recover(mse);
>         throw mse;
> <else>
>         throw mse;
>         <! use following code to make it recover inline; remove  
> throw mse;
>          
> recoverFromMismatchedSet 
> (input,mse,FOLLOW_set_in_<ruleName><elementIndex>);
>         !>
> <endif>
> }<\n>
> >>
>
> matchSetUnchecked(s,label,elementIndex,postmatchCode="") ::= <<
> input.consume();
> <postmatchCode>
> <if(!LEXER)>
>         state.errorRecovery=false;
> <endif>
>         <if(backtracking)>state.failed=false;<endif>
> >>
>
> I then updated the code in the codegen tree walker to directly use  
> matchSetUnchecked in alts where the prediction code already  
> guarantees success. I also have fallback code should one of the  
> targets (or at this point - all of the targets except my C# one) not  
> support the optimization.
>
> protected StringTemplate getTokenElementST( String name,
>                                            String elementName,
>                                            GrammarAST elementAST,
>                                            GrammarAST ast_suffix,
>                                            String label )
> {
>     // BEGIN PIXELMINE: sets optimization
>     bool tryUnchecked = false;
>     if ( name == "matchSet" && ! 
> string.IsNullOrEmpty( elementAST.enclosingRuleName ) &&  
> char.IsUpper( elementAST.enclosingRuleName[0] ) )
>     {
>         tryUnchecked = true;
>     }
>     // END PIXELMINE
>
>     string suffix = getSTSuffix( elementAST, ast_suffix, label );
> // PIXELMINE: I removed the "name += suffix;" line that was here
>     // if we're building trees and there is no label, gen a label
>     // unless we're in a synpred rule.
>     Rule r = grammar.getRule( currentRuleName );
>     if ( ( grammar.BuildAST || suffix.length() > 0 ) && label ==  
> null &&
>          ( r == null || !r.isSynPred ) )
>     {
>         label = generator.createUniqueLabel( elementName );
>         CommonToken labelTok = new CommonToken( ANTLRParser.ID,  
> label );
>         grammar.defineTokenRefLabel( currentRuleName, labelTok,  
> elementAST );
>     }
>
>     // BEGIN PIXELMINE: sets optimization
>     StringTemplate elementST = null;
>     if ( tryUnchecked )
>         elementST = templates.getInstanceOf( name + "Unchecked" +  
> suffix );
>     if ( elementST == null )
>         elementST = templates.getInstanceOf( name + suffix );
>     // END PIXELMINE
>
>     if ( label != null )
>     {
>         elementST.setAttribute( "label", label );
>     }
>     return elementST;
> }
>
> Sam
> _______________________________________________
> antlr-dev mailing list
> [email protected]
> http://www.antlr.org:8080/mailman/listinfo/antlr-dev

_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org:8080/mailman/listinfo/antlr-dev

Reply via email to