[antlr-dev] Potential optimation of set matching in lexers

Sam Harwell Thu, 13 Nov 2008 16:59:27 -0800

I made a few changes to address an issue I was seeing in the generated
code. I'm not confident at this point that it can't introduce new bugs,
so I'd like some of you to take a look and see if it could apply to your
targets as well. This update saved me tens of millions of calls to
input.LA(x) while parsing our sources.


 

If the existing success code in the matchSet StringTemplate is extracted
to its own StringTemplate, we can optimizing the handling of certain set
alts. Here is what the StringTemplate code in the C# port looks like
with the code extracted:

 

matchSet(s,label,elementIndex,postmatchCode="") ::= <<

<if(label)>

<if(LEXER)>

<label>= input.LA(1);<\n>

<else>

<label>=(<labelType>)input.LT(1);<\n>

<endif>

<endif>

if ( <s> )

{

        <! code originally here moved to matchSetUnchecked !>

        <matchSetUnchecked(...)>

}

else

{

        <ruleBacktrackFailure()>

        MismatchedSetException mse = new
MismatchedSetException(null,input);

        <@mismatchedSetException()>

<if(LEXER)>

        recover(mse);

        throw mse;

<else>

        throw mse;

        <! use following code to make it recover inline; remove throw
mse;

 
recoverFromMismatchedSet(input,mse,FOLLOW_set_in_<ruleName><elementIndex
>);

        !>

<endif>

}<\n>

>> 

 

matchSetUnchecked(s,label,elementIndex,postmatchCode="") ::= <<

input.consume();

<postmatchCode>

<if(!LEXER)>

        state.errorRecovery=false;

<endif>

        <if(backtracking)>state.failed=false;<endif>

>> 

 

I then updated the code in the codegen tree walker to directly use
matchSetUnchecked in alts where the prediction code already guarantees
success. I also have fallback code should one of the targets (or at this
point - all of the targets except my C# one) not support the
optimization.

 

protected StringTemplate getTokenElementST( String name,

                                           String elementName,

                                           GrammarAST elementAST,

                                           GrammarAST ast_suffix,

                                           String label )

{

    // BEGIN PIXELMINE: sets optimization

    bool tryUnchecked = false;

    if ( name == "matchSet" && !string.IsNullOrEmpty(
elementAST.enclosingRuleName ) && char.IsUpper(
elementAST.enclosingRuleName[0] ) )

    {

        tryUnchecked = true;

    }

    // END PIXELMINE

 

    string suffix = getSTSuffix( elementAST, ast_suffix, label );

// PIXELMINE: I removed the "name += suffix;" line that was here

    // if we're building trees and there is no label, gen a label

    // unless we're in a synpred rule.

    Rule r = grammar.getRule( currentRuleName );

    if ( ( grammar.BuildAST || suffix.length() > 0 ) && label == null &&

         ( r == null || !r.isSynPred ) )

    {

        label = generator.createUniqueLabel( elementName );

        CommonToken labelTok = new CommonToken( ANTLRParser.ID, label );

        grammar.defineTokenRefLabel( currentRuleName, labelTok,
elementAST );

    }

 

    // BEGIN PIXELMINE: sets optimization

    StringTemplate elementST = null;

    if ( tryUnchecked )

        elementST = templates.getInstanceOf( name + "Unchecked" + suffix
);

    if ( elementST == null )

        elementST = templates.getInstanceOf( name + suffix );

    // END PIXELMINE

 

    if ( label != null )

    {

        elementST.setAttribute( "label", label );

    }

    return elementST;

}

 

Sam

_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org:8080/mailman/listinfo/antlr-dev

[antlr-dev] Potential optimation of set matching in lexers

Reply via email to